Real-Time Analytics at Cloud Scale

In today’s data-driven world, businesses and organisations are generating and collecting vast amounts of data. Analytical databases are a powerful tool for managing and analysing this data to gain valuable insights and make informed decisions. Real-time analytical databases, in particular, have become increasingly popular due to their ability to process and analyse data in real-time, enabling immediate insights and actions.

In the second episode of Dataverse, we spoke to Venkat Venkataramani, co-founder and CEO at Rockset, the real-time analytics database built for the cloud. The session focused on the evolution of databases, how real time processing of data is a more challenging problem than the processing of data at rest. The conversation was peppered with anecdotes and learnings from Venkat’s days at Facebook during the early growth years and why real-time data analytics is so important.

The TL;DR: businesses want real time information and want to act on it as soon as possible.

Why real time data processing matters

It would be difficult to find a field that has not benefited from the advent of social media, and Facebook has undeniably been at the centre of this revolution. Venkat worked at Facebook in the early years, when the company scaled from 30-40 million monthly active users (MAUs) to 1.5 billion MAUs. One of the key content observations he made at that time was the switch from batch (where data is stored over a period of time in batches) processing to real time data processing.

The key difference in batch and real time data processing is the speed at which the processor has to respond to incoming queries. And they figured it out. The database system developed by his team was handling over 5 billion queries per second with less than 5 ms latency. By 2015, all real-time analysis functions at Facebook, from fighting spam to news feed and Messenger, were using this database called RocksDB.

Venkat realised the power of the system for customer facing applications, and also that the two main barriers to making it go mainstream were cost and complexity. And that became the genesis for what he is now building with Rockset.

Challenges in real-time data

Ravi remembers the early days of GoJek, when the business analysed data with batch processing through commonly available database products such as Postgres and BigQuery. As the orders began flowing in and GoJek began scaling (over 170 million downloads), the need to move from batch processing to real-time became a necessity. And in the absence of a single solution to serve these needs, Ravi mentions a combination of Elasticsearch for full text search through the databases, ClickHouse for generating analytical reports using SQL queries in real-time and InfluxDB to generate real time analytics with a stream processing engine.

Just the number of systems there should give you an idea of how hard the problem of analytics for real-time data is. Businesses are increasingly looking to derive value from such data flowing at high speeds.

Deriving value from real-time data processing

The evolution of databases began in the 1980s, when Oracle introduced the concept of relational databases. That journey of businesses going from pen and paper storage to digital record keeping was the original digital transformation.

Through the 1990s and early 2000s, these databases evolved to batch processing of data, where businesses could use stored data to work for them. The current revolution that is brewing in database technology is processing data in real time.

Real time processing, on the other hand, enables several use cases that were not possible earlier:

Fraud detection: Real-time analytical databases can be used to identify fraudulent transactions as they happen, enabling businesses to take immediate action to prevent losses.

Customer analytics: Real-time analytical databases can be used to analyse customer data as it changes, enabling businesses to improve customer experiences and increase customer retention.

Inventory management: Real-time analytical databases can be used to monitor inventory levels as they change, enabling businesses to optimise inventory management and reduce costs.

Social media analytics: Real-time analytical databases can be used to monitor social media data as it changes, enabling businesses to identify trends and take advantage of opportunities.

Real-time analytical databases, like Rockset, have emerged as a solution to this challenge, offering fast and efficient data processing and analysis capabilities. Rockset offers several unique features that set it apart from other database systems such as fully managed database, support for SQL, support for several data formats all backed by their execution engine.

A business can run analytics to figure out how many customers made a particular kind of transaction in the past 24 hours. But value addition comes from getting the system to continuously find out how many customers made those kinds of transactions in say, the last one hour. They can then move one step towards taking an action such as sending a notification to those people, to add value to their next transaction or prompting another kind of similar spend.

Lastly, we are excited about the future of data and this is going to be the place where we bring you all the latest trends and goings-on in the space. If you are a data practitioner who would like to be a part of Dataverse, please sign up here or write to me at manjot@lsip.com.

Lightspeed Possibility grows the deeper you go. Serving bold builders of the future.