Redirecting to default login... PySpark: Level 01 – Mu Sigma University Learning Management System

Course

PySpark: Level 01

One of the most valuable technology skills is the ability to analyze huge data sets, and one of the best technology for this task is Apache Spark. Top technology companies like Google, Facebook, Netflix, Airbnb, Amazon, NASA, and more are all using Spark to solve their big data problems. Spark is a tool for doing parallel computation with large datasets and it integrates well with Python. PySpark is the Python package that can make this happen. Apache Spark is a lightning-fast cluster computing designed for fast computation. It was built on top of Hadoop MapReduce and it extends the MapReduce model to efficiently use more types of computations which includes Interactive Queries and Stream Processing. This is a brief tutorial that explains the basics of Spark Core programming.This course has been designed to enhance your Spark skills. At the end of this course, you will be able to understand Spark DataFrames, DataFrame Operations and important concepts such as Partitioning of data.

22 Lessons

Outcomes By the end of the course, learners will be able to: Rationale behind usage of Apache Spark, and its implementation Situations where Spark, and its Python implementation PySpark can be leveraged for super-fast processing Different operations using PySpark including passing functions, caching and transformations/actions Modes of running Spark so as to choose the right method for launching an application Parameters of Spark Session and difference between SparkSession and SparkContext Creating Data Frames from csv files, existing RDD and by transforming existing DataFrame as well as using StructType Working with file formats – Parquet, Avro & ORC and create DataFrames from the same Transformations, Actions and other operations on DataFrame Performing partitioning & repartitioning Hands-on experience to work on DF’s, SparkSQL Course Contributors: Issac Joseph Shivanand Ukkali	Level:	01
	Duration:	22 Hours
	Pre-requisites:	Python (Level 1), SQL (Level 1) – For Spark SQL
	What’s next:	Data Management

Lessons

Expand All | Collapse All

Pre-requisites

Introducing Spark
3 Topics

Getting Started with Spark and PySpark
4 Topics

Spark RDDs

RDD Operations: Transformations, Actions and Passing Functions
4 Topics

Key/Value Pair and Partitioning RDDs
3 Topics

Spark SQL
7 Topics

Spark Session

Working with different File Formats

Spark Integration with Oracle Databases

Spark transformations, actions and other operations

SparkSQL with DataFrame

@

Not recently active