“We've always run Postgres databases for our primary data store and have found it very capable and reliable. I've worked extensively on every level of our application stack and the database is my favorite part. We're able to achieve great performance through extensive application and schema level tuning.
Speed: We have spent a ton of time making our queries faster. Our general approach is to profile a query using the "explain analyze" keywords to see how Postgres is making use of the existing indexes and how much work it needs to do for each step. Once we've identified the slow section we can experiment with building different single or multi column indexes and rerunning the query. For columns with low cardinality like a Boolean we've found B-Tree indexes don't help much and often Postgres will just ignore them. This can usually be solved with a multicolumn index or by rewriting the query such that the low cardinality columns are filtered out as late as possible after most of the rows were eliminated by a more efficient index. Postgres also supports other index types like Hash, BRIN, GiST, and GIN that can work well when B-Trees aren't giving you the speed needed.
It's also important to consider your application's data access pattern because just looking at query run times can be misleading. If you're making a lot of small queries the network overhead can result in worse overall performance than making one large query. For example, we may want to pull some data for every car model of a brand. Limiting our queries to just one model at a time results in very fast queries, but ends up being slower overall because of network latency. The query over the entire brand could have been many times slower, but because most brands sell a lot of different car models the overall result was much faster.
Security: In addition to the normal advice of locking down access as much as possible with IP/network access and restricting user/role access to just what is needed, we also recommend requiring strict use of parameterized queries any time a variable is used in a query to prevent sql injection. When we started building our application data security was one of our top priorities, so we built our sql interface to be as secure as possible by default. In our case every sql query made by our application through every level of the stack uses only parameterized queries. Because our application is always changing it's impossible to be sure that a given input will stay safe forever, so it's better to parameterize everything and avoid sql injection risks all together. You also get a more robust system that can handle "Wally's Autoland" having a sale.
Robust/ Maintainable/Relational: These are closely related for us. We make use of a lot of foreign keys and type definitions to keep data in sync. One place we could have done better is in naming some of our Many to One relationships. We have one table of ("objectID", "mappedID") where the mapping direction isn't immediately obvious. We don't work with this table very often, so whenever it comes up we have to stop and lookup which ID is mapped to the other. Better column names would have been ("originID", "targetID").
We try to normalize data as much as possible, but in some cases it's just easier to write queries if you denormalize a little. Denormalizing might create some problems with data desyncing, so we recommend trying to find a balance that works for you and your application. We're able to keep things synchronized with some extra checks when updating and the outcome of making queries easier to write and understand is well worth it.
As your row counts grow you many need to revisit some database wide maintenance settings. In Postgres you need to perform routine vacuuming to update table statistics and get rid of old rows. Failure to vacuum can degrade performance significantly. By default, it happens after a specified fraction of the table's rows are updated or deleted, but with large tables that fraction could be tens of millions of rows. We recommend revisiting your maintenance settings periodically to make sure they still make sense with how you're using the database. For Postgres specifically you should also monitor the autovacuum process to make sure that it's able to keep up with the number and size of tables in your database. With high write workloads it's easy to get into a situation where your vacuums can't keep up and performance suffers. In that case you'd need to increase the number of autovacuum processes or give each process more resources.”
- William Young, Founder of Competitive Intelligence Solutions LLC.