Apache Spark Training In Nagpur

Apache Spark

Syllabus​

Apache Spark Training Overview

Spark is a unique framework for big data analytics which gives one unique integrated API by developers for the purpose of data scientists and analysts to perform separate tasks. It supports a wide range of popular languages like Python, R, SQL, Java and Scala. Apache Spark main aim is to provide hands-on experience to create real-time Data Stream Analysis and large-scale learning solutions for data scientists, data analysts and software developers.

Apache Spark Training Objectives

  • Apache Spark Architecture How to use Spark with Scala How to deploy Spark projects to the cloud Machine Learning with Spark

Pre-requisites of the Course

  • Basic knowledge of object-oriented programming is enough Knowledge of Scala will be an added advantage
  • Learners who have basic knowledge on Database, SQL Query will be an added advantage for Learning this Course

Who should do the course

  • Developers, Architects, IT Professionals
  • Software Engineers, Data scientists, and Analysts

Apache Spark Course Content

Batch and Real-Time Analytics with Apache Spark

SCALA (Object Oriented and Functional Programming)

  • Getting started With Scala
  • Scala Background, Scala Vs Java and Basics
  • Interactive Scala – REPL, data types, variables, expressions, simple functions
  • Running the program with Scala Compiler
  • Explore the type lattice and use type inference
  • Define Methods and Pattern Matching

Scala Environment Set up

  • Scala set up on Windows and UNIX

Functional Programming

  • What is Functional Programming?
  • Differences between OOPS and FPP

Collections ( Very Important for Spark )

  • Iterating, mapping, filtering, and counting
  • Regular expressions and matching with them
  • Maps, Sets, group By, Options, flatten, flat Map
  • Word count, IO operations, file access, flatMap

Object-Oriented Programming

  • Classes and Properties
  • Objects, Packaging, and Imports
  • Traits
  • Objects, classes, inheritance, Lists with multiple related types, apply

Integrations

  • What is SBT?
  • Integration of Scala in Eclipse IDE
  • Integration of SBT with Eclipse

SPARK CORE

  • Batch versus real-time data processing
  • Introduction to Spark, Spark versus Hadoop
  • The architecture of Spark
  • Coding Spark jobs in Scala
  • Exploring the Spark shell to  Creating Spark Context
  • RDD Programming
  • Operations on RDD
  • Transformations
  • Actions
  • Loading Data and Saving Data
  • Key Value Pair RDD
  • Broadcast variables

Persistence

  • Configuring and running the Spark cluster
  • Exploring to Multi-Node Spark Cluster
  • Cluster management
  • Submitting Spark jobs and running in the cluster mode
  • Developing Spark applications in Eclipse
  • Tuning and Debugging Spark

CASSANDRA ( N0SQL DATABASE )

  • Learning Cassandra
  • Getting started with architecture
  • Installing Cassandra
  • Communicating with Cassandra
  • Creating a database
  • Create a table
  • Inserting Data
  • Modelling Data
  • Creating an Application with Web
  • Updating and Deleting Data

Spark Integration with NoSQL (CASSANDRA) and Amazon EC2

  • Introduction to Spark and Cassandra Connectors
  • Spark With Cassandra  to Set up
  • Creating Spark Context to connect the Cassandra
  • Creating Spark RDD on the Cassandra Database
  • Performing Transformation and Actions on the Cassandra RDD
  • Running Spark Application in Eclipse to access the data in the Cassandra
  • Introduction to Amazon Web Services
  • Building 4 Node Spark Multi-Node Cluster in Amazon Web Services
  • Deploying in Production with Mesos and YARN

Spark Streaming

  • Introduction of Spark Streaming
  • Architecture of Spark Streaming
  • Processing Distributed Log Files in Real Time
  • Discretized streams RDD
  • Applying Transformations and Actions on Streaming Data
  • Integration with Flume and Kafka
  • Integration with Cassandra
  • Monitoring streaming jobs

Spark SQL

  • Introduction to Apache Spark SQL
  • The SQL context
  • Importing and saving data
  • Processing the Text files, JSON and Parquet Files
  • DataFrames
  • user-defined functions
  • Using Hive
  • Local Hive Metastore server

Spark MLLib

  • Introduction to Machine Learning
    Types of Machine Learning
  • Introduction to Apache Spark MLLib Algorithms
  • Machine Learning Data Types and working with MLLib
  • Regression and Classification Algorithms
  • Decision Trees in depth
  • Classification with SVM, Naive Bayes
  • Clustering with K-Means
  • Building the Spark server

Leave a Comment

Your email address will not be published. Required fields are marked *