“Within the next year, the number of data and analytics experts in business units will grow at three times the rate of experts in IT departments, which will force companies to rethink their organizational models and skill sets. – Gartner, 2020.
Data & Analytics teams are becoming more and more essential in supporting various complex business processes, and many are challenged with scaling the work they do in delivering data to support their use cases. The pressure to deliver faster and with higher quality is causing data & analytics leaders to rethink how their teams are organized…
Where traditional waterfall models were implemented and used in enterprises in the past, these methodologies are now proving to be too long, too siloed, and too overwhelming!
This is where Data Ops steps in: a more agile, collaborative and change-friendly approach for managing data pipelines.
Data Ops definition
Gartner defines Data Ops as being a “collaborative data management practice focused on improving the communication, integration and automation of data flows between data managers and data consumers across an organization.”. Basically, making life easier for data users.
Similar to how DevOps, a set of practices that combines software development (Dev) and information-technology operations (Ops), changed the way we deliver software, DataOps uses the same methodologies for teams building data products.
While both agile frameworks, DataOps requires the coordination of data and anyone that works with data across the entire enterprise.
Specifically, data & analytics leaders should implement these key approaches that proved to deliver significant value for organizations:
- Deployment frequency increase: shifting towards a more rapid and continuous delivery methodology enables organizations to reduce the time to market.
- Automated testing: removing time-consuming, manual testing enables higher quality data deliveries.
- Metadata control: tracking and reporting metadata across all consumers in the data pipeline ensures better change management and avoids errors.
- Monitoring: tracking data behavior and the usage of the pipeline enables more rapid identification on both flawed – that needs to be corrected – and good quality data for new capabilities.
- Constant collaboration: communication between data stakeholders on data is essential for faster data delivery.
Who is involved in Data Ops?
Given the importance of data and analytics use cases today, the roles involved in successful data project delivery are more numerous and more distributed than ever before. Ranging from data science teams to people outside of IT, a large number of roles are involved:
- Business analysts,
- Data architects,
- Data engineers,
- Data stewards,
- Data scientists,
- Data product managers,
- Machine Learning developers,
- Database administrators,
- Etc.
As mentioned above, a Data Ops approach requires fluid communication and collaboration across these roles. Each collaborator needs to understand what others expect of them, what others produce, and must have a shared understanding of the goals of the data pipelines they are creating and evolving.
Creating channels through which these roles can work together, such as a collaboration tool, or metadata management solution, is the starting point!