muLearn Courses – Mu Sigma University Learning Management System

MongoDB: Level 01

MongoDB is a cross-platform document-oriented database program. Classified as a NoSQL database program, MongoDB uses JSON-like documents with schema. MongoDB is developed by MongoDB Inc. and licensed under the Server Side Public License (SSPL). It is used for high-volume data storage. Instead of using tables and rows as in the traditional relational databases, MongoDB makes use of collections and documents. Documents consist of key-value pairs which are the basic unit of data in MongoDB. Collections contain sets of documents and function which is the equivalent of relational database tables. MongoDB is a database which came into light around the mid-2000s.

10 Lessons

Structured Query Language (SQL): Level 01

by sankara.subramanian

Structured Query Language (SQL) is a domain-specific language used in querying data and designed for managing data held in a relational database management system (RDBMS), or for stream processing in a relational data stream management system (RDSMS). It is particularly useful in handling structured data, i.e. data incorporating relations among entities and variables. SQL offers two main advantages over the older read–write APIs such as ISAM or VSAM; firstly, it introduced the concept of accessing many records with one single command; secondly, it eliminates the need to specify how to reach a record, e.g. with or without an index. Originally based upon relational algebra and tuple relational calculus, SQL consists of many types of statements, which may be informally classed as sub-languages, commonly: a data query language (DQL), a data definition language (DDL), a data control language (DCL), and a data manipulation language (DML). The scope of SQL includes data query, data manipulation (insert, update and delete), data definition (schema creation and modification), and data access control.

6 Lessons

Available

Apache Hadoop: Level 01

by asrar.bagwala

The Apache Hadoop project develops open-source software for reliable, scalable, distributed computing. The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.

7 Lessons

Amazon Redshift

by issac.joseph

Amazon Redshift is one of the key big-data analytics related services in the Amazon Web Services technology stack. Redshift can handle thousands of Terabytes (petabyte) sized data in a clustered environment, and provides data warehouse as a service on Amazon Cloud platform. Redshift is one of the relatively easier services to learn for big data scale analytics - which means an easy gateway to your entry in the big data analytics world.

10 Lessons

PySpark: Level 01

by sankara.subramanian

One of the most valuable technology skills is the ability to analyze huge data sets, and one of the best technology for this task is Apache Spark. Top technology companies like Google, Facebook, Netflix, Airbnb, Amazon, NASA, and more are all using Spark to solve their big data problems. Spark is a tool for doing parallel computation with large datasets and it integrates well with Python. PySpark is the Python package that can make this happen. Apache Spark is a lightning-fast cluster computing designed for fast computation. It was built on top of Hadoop MapReduce and it extends the MapReduce model to efficiently use more types of computations which includes Interactive Queries and Stream Processing. This is a brief tutorial that explains the basics of Spark Core programming.This course has been designed to enhance your Spark skills. At the end of this course, you will be able to understand Spark DataFrames, DataFrame Operations and important concepts such as Partitioning of data.

22 Lessons

Structured Query Language (SQL): Level 0

by aayushi.agrawal

The role of a data scientist is to turn raw data into actionable insights. Much of the world's raw data—from electronic medical records to customer transaction histories—lives in organized collections of tables called relational databases. Therefore, to be an effective data scientist, you must know how to wrangle and extract data from these databases using a language called SQL (pronounced ess-que-ell, or sequel). This course teaches you everything you need to know to begin working with databases today!

7 Lessons

@