I recently sat in on a demo for a new tool being developed by some engineer colleagues and noticed a troubling trend in how we choose infrastructure technologies. To set the stage, these engineers were hard at work for months developing a system that trained science models and presented the results to a UI in table form. Like any data exploration tool, an important need is to query by different fields, sort them by value, and other common methods to manipulate data.
“The UI looks great”, the customer said. “But how do we sort the records by X field, or Y field, or Z field?”. The engineer giving the demo was visually taken aback by such a simple question. Having chosen DynamoDB as database, querying a new data field involves a non-trivial set of work such as creating a new Global Secondary Index (GSI), changing your query pattern, and updating your pagination code. Not to mention the cost implications that come out of creating each additional GSI. “We can add that functionality”, he replied. “But it’s going to take 3 weeks of effort”. The customer let off an audible sigh.
In the aftermath of the demo, my team and I discussed our options. Given that our customers expected the ability to query, sort, filter, and more generally, explore any field in the output table, we concluded that a database like Postgres or MySQL would be a better choice. But getting there would prove to be a multi-month undertaking and compromise our deadlines—an unfortunate update that our customers would not be happy about.
While on the commute home (yes no more work from home for me!), I contemplated how this could have happened. I concluded the engineers must have chosen the technology without understanding the needs of their customer—a major anti pattern for a production application that is expected to scale.
The next day, I inquired with the lead engineer to confirm my suspicions but discovered something unexpected. It wasn’t just that we didn’t consider the needs of our customers, it’s that we defaulted to DynamoDB just because it was the norm. “That’s what everyone here uses”, he said. He’s certainly correct that in our domain we typically build applications that require fast key/value lookup stores, but this was not the need in this case. The key insight that I gleaned from the conversation was that we chose the technology for our application without first understanding the needs from our customers.
Picking the technology before understanding the needs is an all too common occurrence for newer software developers—so much so that a simple google search for “picking the right type of database” generates millions of hits. If you’re curious, this one in particular focuses on AWS and does a pretty good job of describing the key considerations you need to keep in mind.
But this isn’t all about databases. Considering your needs before picking the tech is a mental exercise that extends to many other areas of software development, from choosing Lambda over ECS to Redis over Memcached, you need to do the legwork to understand what your customers are trying to do and how they expect to use your application. From there, you can explore a set of tech options and evaluate their pro’s and con’s. Not doing so can put you in an awkward circumstance where you either need to work around your choices through inefficient application code, or go through the a painful and lengthy migration process. Both aren’t ideal and can be avoided from the get go. Just ask my team’s engineer about his experience.