Member-only story
How to build a DAG based Task Scheduling tool for Multiprocessor systems using python
Scheduling Big Data Workloads and Data Pipelines in the Cloud with pyDag

Much of the success of data driven companies of different sizes, from startups to large corporations, has been based on the good practices of their operations and the way how they keep their data up to date, they are dealing daily with variety, velocity and volume of their data, In most cases their strategies depend on those features. Some of the aims of the data team in this type of companies are:
- Design and deploy cost effective and scalable data architectures
- Get insights from their data
- Keep the business and operations up and running
In order to achieve these aims the data team uses tools, most of these tools allow them to extract, transform and load data to other places or destination data sources, visualize data and convert data into information. It is very common to see ETL tools, task scheduling, job scheduling or workflow scheduling tools in these teams. It is worth mentioning that the terms: task scheduling, job scheduling, workflow scheduling, task orchestration, job orchestration and workflow orchestration are the same concept, what could distinguish them in some cases is the purpose of the tool and its…