Data is a massive topic. Literally. Over 400 million terabytes of data are generated globally every single day. Sensor data from IoT (Internet of Things) devices, user-generated videos on platforms like Snapchat and TikTok, as well as enterprise data, all contribute to the vast quantities of data, and that number is only growing.

Where To Base Your Data

Data is a massive topic. Literally. Over 400 million terabytes of data are generated globally every single day. Sensor data from IoT (Internet of Things) devices, user-generated videos on platforms like Snapchat and TikTok, as well as enterprise data, all contribute to the vast quantities of data, and that number is only growing.

Around 90 to 95% of this data is ephemeral. It’s used for aggregation, real-time analysis, or brief storage before being deleted. Even so, that still leaves around 40 million terabytes of data requiring storage every day.

This data is meant to be used. It makes up work emails, product inventories, analytics dashboards, social media posts, and more. This raises an important question:

How do we store all of this data?

And more to the point:

how do we store it in a way that lets us interact with it meaningfully?

To answer this, we need to have a look at structured, unstructured, and semi-structured data.

Structured Data

Structured data is the dream of the orderly mind. It fits neatly into logical groups and can be further divided into rows and columns. If you’ve ever looked at an Excel spreadsheet, you’ve experienced structured data. Each row is a separate entry, and each column is an attributeof that entry.

Imagine your website keeps track of all the users who sign up. Each user fills out their first and last names, email address, and date of birth. All of this data is consistent and categorised neatly, which makes querying and analysing it easy.

Unstructured Data

On the other hand, unstructured data is the ultimate embodiment of flexibility. It can take many forms, including text, video, images, logs or sensor readings. In fact, unstructured data is so flexible that sequential records can look entirely different.

Unstructured data is usually stored as a BLOB (that’s a Binary Large Object), which is just a sequence of bytes. This makes it functionally useless by itself. Is this BLOB the important file you need to send to a client, or is it just a photo of your lunch? Without some sort of context clues, it’s impossible to say.

Semi-structured data

This is where semi-structured data comes in. Semi-structured data contextualises unstructured data by wrapping it with supporting information (metadata). This can include tags, usernames, timestamps, location data, and crucially, a link to the file system or object storage where the unstructured data sits in BLOB form.

Because it exists to support and contextualise unstructured data, the shape of semi-structured data varies between records, but it provides that all-important context that makes unstructured data usable.

This brings us back to our original question: Where do we store all this data? And how do we extract meaningful insights from it?

The Database

Databases come in two types. Relational and Non-relational.

Relational databases (SQL)

Relational databases are powered by SQL (Structured Query Language), and are the ideal choice for structured data because they have a very rigid schema. The schema maps out what tables belong in the database, what columns belong in a table, and what datatypes are allowed in a column. Tables group your data, so users are stored in the users table, orders are stored in the orders table, and so on.

Each row in the table is uniquely identifiable by a primary key, ensuring no two rows are identical. Relations between tables are supported by foreign keys. For example, your orders table might contain a user id key, which matches the primary key from the users table, letting you understand which user placed which order.

Using SQL, we can build up large, expressive queries that pull meaningful insights out of structured data.

Non-relational (NoSQL)

Non-relational, or NoSQL, databases are designed for storing semi-structured data. They can store data as documents, graphs, or key-value pairs. Their flexible schemas make them ideal for storing social media posts, which might contain videos or photos, chat messages, activity logs, or sensor data. In fact, they’re ideal for storing any type of data where each record might have different fields or optional attributes. Imagine a social media post with tags, a video, and text, followed by a post with just text. A NoSQL database lets you query, filter, and aggregate this data, even on records that don’t share the same fields.

So, which is better for your business?

As is so often the case in development, the answer is that it depends. Each approach has its strengths and its overheads. But here at Shape, data is our bread and butter. So if you’re looking for guidance on the best approach for your business or app, whether you’re planning analytics, wrangling datasets, or building a project, we’d love a chat.

Matt

Relational vs NoSQL databases