muLearn Courses

Data governance is a data management concept concerning the capability that enables an organization to ensure that high data quality exists throughout the complete lifecycle of data and data controls are implemented that support business objectives. The key focus areas of DG include availability, usability, data quality, data lineage, data catalog and data privacy & security. Data governance encompasses the people, processes and information technology required to create a consistent and proper handling of an organization's data across the business enterprise. Data governance is a quality control discipline for assessing, managing, using, improving, monitoring, maintaining and protecting organizational information. It is a system of decision rights and accountability for information-related processes, executed according to agreed-upon models which describe who can take what actions with what information, when, under what circumstances and using what methods. This course covers fundamentals of data governance and it will enable you to understand concepts such as data lineage, data quality and catalog.

See more...

4 Lessons
Apache Airflow is an open-source platform to programmatically author, schedule and monitor workflows. It was created at Airbnb and currently is a part of Apache Software Foundation. Airflow helps you to create workflows using Python programming language and these workflows can be scheduled and monitored easily with it. If you have many ETL(s) to manage, Airflow is a must-have. In this course you are going to learn everything you need to start using Apache Airflow. Starting from very basic notions such as, what is Airflow and how it works, to advanced concepts such as, how to create plugins and make real dynamic pipelines. This is an introductory course where you’ll learn everything you need to get started with Airflow. The course is designed for: - People who are curious about data engineering. - People who want to learn basic and advanced concepts about Apache Airflow. - People who like hands-on approach.

See more...

6 Lessons
Talend Open Studio for Data Integration is the leading open source solution for data integration and data management solutions. Data integration involves combining data stored in different sources and providing a unified view of these data. It helps you to manage various ETL jobs, and empower users with simple, self-service data preparation.It lets you connect and manage all your data, no matter where it lives. You can use more than 1,000 connectors and components to connect any data source with any data environment, in the cloud or on premises. It also provides a unified repository to store and reuse the Metadata. Easily develop and deploy reusable data pipelines with a drag-and-drop interface that’s 10 times faster than hand-coding.

See more...

14 Lessons
Outcomes By the end of the course, learners will be able to: Familiarize themselves with basic Data Engineering concepts. Get an understanding of the scope, scale, and the limitations of data engineering work in Mu Sigma Equip themselves with all the required ammunition to trigger conversations around the data landscape and data engineering needs of […]

See more...

6 Lessons
Data engineering is the aspect of data science that focuses on practical applications of data collection and analysis. For all the work that data scientists do to answer questions using large sets of information, there must be mechanisms for collecting and validating that information. In order for that work to ultimately have any value, there also have to be mechanisms for applying it to real-world operations in some way. Those are both engineering tasks: the application of science to practical, functioning systems. Data engineers focus on the applications and harvesting of big data. Their role doesn’t include a great deal of analysis or experimental design. Instead, they are out where the rubber meets the road (literally, in the case of self-driving vehicles), creating interfaces and mechanisms for the flow and access of information.

See more...

6 Lessons
Available
SQL Server Integration Services (SSIS) is a component of the Microsoft SQL Server database software that can be used to perform a broad range of data migration tasks. SSIS is a platform for data integration and workflow applications. It features a data warehousing tool used for data extraction, transformation, and loading (ETL). The tool may also be used to automate maintenance of SQL Server databases and updates to multidimensional cube data.

See more...

11 Lessons
This course covers essential exploratory techniques to get familiar with datasets. Using business intuition, graphical & statistical techniques to define the behavioral nature of the data. It also addresses various issues with the dataset like missing values and outliers and describes different ways to treat them. These important techniques generally serve as a pre-cursor to statistical modeling.

See more...

7 Lessons
Data engineering for Induction is a beginner level classroom training course intended to train inductees on how to create a basic data engineering pipeline.

See more...

2 Lessons