The State of Data Engineering in 2022 | RudderStack

By neub9
4 Min Read

At the start of last year, our team delved into the history of the data engineering trend and identified several major trends in data engineering that were emerging in 2021. Now, as we enter 2022, we are revisiting our 2021 predictions and providing new insights into the state of the data engineering industry.

In retrospect, our prediction that data would be discussed at the board level was a bit too ambitious. While data is undoubtedly crucial to driving discussions, the direct reporting of a data role to the board has not yet materialized. However, we have noticed a growing trend of board-level reviews of organizational data competency, reflecting an increased level of accountability among leadership teams.

Our prediction of dedicated data engineering support for each team came close to realization in 2021. Various structures have emerged, such as one team of data engineers supporting multiple teams, or each team having its own data engineer. This trend is expected to continue to accelerate throughout 2022.

We accurately predicted the rise of unicorns solving data problems, with companies like Clickhouse and Airbyte securing significant funding rounds. Additionally, the commoditization of data platforms, particularly in areas such as data quality and observability, continues to gain momentum.

Regarding real-time and streaming infrastructure, while it’s unclear if real-time data has surpassed batch data in importance, companies are increasingly investing in real-time infrastructure. In 2022, we anticipate keeping a close watch on new products emerging in this space.

Assessing the current state of the data engineering industry, we recognize its youth and rapid growth. Modern cloud architectures, although relatively young, have sparked excitement in the industry. However, the industry’s youth is evident in the proliferation of new terminology and the evolving definitions of common terms, such as “modern data stack,” “operational analytics,” and “data lakehouse.”

The data tooling landscape is expanding in both size and complexity, as seen in the growing number and variety of tools supporting cloud data warehouses and data lakes. This trend has prompted the integration of software development principles and processes into the data space, leading to the emergence of a new class of data tooling inspired by software engineering.

Despite these advancements, data quality and governance remain challenging problems within a growing and dynamic data stack. While companies have invested in solutions at various points in the data pipeline, a stack-wide governance architecture that works well for many companies has not yet seen wide adoption.

As we move forward into 2022, the data engineering industry will continue to evolve, and we anticipate further advancements in dedicated data engineering support, real-time infrastructure, and the integration of software development principles into the data space. The challenges of data quality and governance will persist, but we expect ongoing efforts to develop effective solutions for these critical areas.

Share This Article
Leave a comment

Leave a Reply

Your email address will not be published. Required fields are marked *