SQL, or Structured Query Language, is the standard communication language used to speak with relational databases. SQL databases allow users to avoid writing custom code to interact with data sets, and instead use a well defined and standardized language that is transferable between many different database dialects.
By the way, if you prefer a video version of this post, check out my YouTube video on this topic below.
History of SQL
Conceptually, SQL was created in the 1960s, but it wasn’t til the 80s where it became the standard database communication language. Fastforward to the 90s and the exponential growth of the dot.com industry, SQL database usage was growing exponentially. This is the era where familiar databases such as Microsoft SQL Server, MySQL, and PostGres were created. Its important to note here that SQL is the language that is used to interact with these database products, and for the most part, the syntax is roughly the same.
SQL is used by software developers, analysts, data engineers, many more professionals. Its a ubiquitous language that allows people to hop between jobs and still have the fundamental skills to interact with a company’s data.
Is SQL Losing Relevance?
Despite having a meteoric rise for many years, SQL has been fading as the de-facto choice for software applications that need to work with large datasets at low latencies. In the age of big data, developers need to be considerate of the limitations of SQL especially when processing large volumes of information. This is especially important in the modern age of information, where companies are collecting more data than ever before and struggling to derive meaningful insight from it.
Think I’m exaggerating? Look at these graphs from Google Trends. SQL has been on the downward trajectory since 2004.
So what gives? Why is a universal language like SQL with so many applications across so many industries losing its edge? And what is taking over? The answer lies in one word: Data.
SQL and its Challenges with Big Data
We are in a world where data is more valuable than gold. Companies are collecting a mind boggling amount of data per day and struggling to store it, process it, and analyze it. In fact, a study by the IDC predicted that the volume of data being generated between 2010 to 2020, grew by a factor of 44x. All of this data eventually lands in a data storage product somewhere where it can then be queried and analyzed by analysts, developers, and machine learning algorithms.
As a quick example, think about the time you spend on Instagram. Every time you scroll through your Instagram feed and pause to read a post, Instagram is collecting data on which posts you interact with, how long you spend looking at them, whether or not you engage with them, and the actions that you take following this engagement. Companies like Instagram attempt to extract this information to better predict which content it should present to you in the future.
Processing these large volumes of data for users becomes a significant challenge for classic SQL based systems. A classic SQL database configuration would handle more data by upgrading the hardware of the database. Unfortunately, there’s only so much you can upgrade by – eventually, your computer can’t get any more powerful. Naturally, this also means that there’s a limit on the volume of data you can store in a database, and the performance you can expect to get out of it. This concept is called Vertical Scaling – its the idea that in order to deal with our computational problems, we just build a more powerful computer. And if you’re interested about learning more about scaling, I have a whole video on it available here.
So how have developers handled the demands of modern data collection? The answer lies in a different technology.
NoSQL to the Rescue
NoSQL, also known as ‘Not Only’ SQL, is an alternative technology used to store and retrieve data at ultra fast speeds.
Instead of relying on Table and Column data structures like in SQL, NoSQL uses a hashing function to index and retrieve data. For instance, in most NoSQL databases like MongoDB or DynamoDB, input data is hashed using a hashing function that outputs the location of the record in the database. This way, the database knows exactly where this record is located and can quickly retrieve it later. Take a look at this graph of NoSQL popularity. It exploded in popularity in 2009 and has remained pretty stable.
This strategy of partitioning data in advance is one of NoSQL’s greatest strengths, but also one of its greatest weaknesses. Its positive because it allows NoSQL databases to scale out by adding more data storage nodes, whereas SQL databases tend to scale up by beefing up their single database instance. Now I do realize I’m simplifying the complexities of scaling a bit, but I have an entire video on YouTube on that topic that you can check out.
This strategy of scaling out instead of scaling up is what makes NoSQL databases so special. It’s attractive for applications that need to handle large volumes of data with low latency and high throughput.
But it isn’t all positives for NoSQL. High performance and low latency this functionality is made at a sacrifice of query flexibility. This means that NoSQL databases CAN’T query by any field, or group by any field like can be done in SQL. This means developers looking to use NoSQL often require careful thought when declaring a schema so that it is compatible with their future anticipated access patterns.
Began wondering, is SQL Dying? Serverless frameworks like aws amplify & xxx offer nosql data storage solutions out of the box. More I started thinking about the more I began to think, is SQL on its way out? Is SQL Dying?
Is SQL Still Relevant?
The big question is – is SQL still relevant and should you learn it? Well, the answer is: It’s complicated.
SQL has established itself as the de-facto data access language used across many different industries. This means that SQL is no longer just used by developers, but folks from many different parallel technical fields that need to store, manipulate, and access large volumes of data.
For this reason, SQL isn’t going anywhere. Its stable in popularity and still dwarfs NoSQL databases on a side by side comparison.
In my professional opinion, I think NoSQL is trending towards becoming a required skill for software applications – especially backend applications that operate at massive scale and require ultra fast performance. This leads most observers to believe that there’s enough room in this world for both SQL and NoSQL solutions.
That being said, SQL will remain in place for years to come as the standard data access language used across many different job families. You just can’t go wrong learning SQL in 2020 if you expect to be in any kind of technical field, or looking for any job that is going to be managing data.