Cloudera Developer Training for Apache Spark

Cloudera Developer Training for Apache Spark Course Description

Duration: 3.00 days (24 hours)

This three-day administrator training course for Apache Spark enables participants to build complete, unified Big Data applications combining batch, streaming, and interactive analytics on all their data. With Spark, developers can write sophisticated parallel applications to execute faster decisions, better decisions, and real-time actions, applied to a wide variety of use cases, architectures, and industries.

Next Class Dates

Contact us to customize this class with your own dates, times and location. You can also call 1-888-563-8266 or chat live with a Learning Consultant.

Back to Top

Intended Audience for this Cloudera Developer Training for Apache Spark Course

  • » This course is best suited to developers and engineers. Course examples and exercises are presented in Python and Scala, so knowledge of one of these programming languages is required. Basic knowledge of Linux is assumed. Prior knowledge of Hadoop is not required.

Back to Top

Cloudera Developer Training for Apache Spark Course Objectives

  • » Using the Spark shell for interactive data analysis
  • » The features of Spark's Resilient Distributed Datasets
  • » How Spark runs on a cluster
  • » Parallel programming with Spark
  • » Writing Spark applications
  • » Processing streaming data with Spark

Back to Top

Cloudera Developer Training for Apache Spark Course Outline

      1. Why Spark?
        1. Problems with Traditional Large-Scale Systems
        2. Introducing Spark
      2. Spark Basics
        1. What is Apache Spark?
        2. Using the Spark Shell
        3. Resilient Distributed Datasets (RDDs)
        4. Functional Programming with Spark
      3. Working with RDDs
        1. RDD Operations
        2. Key-Value Pair RDDs
        3. MapReduce and Pair RDD Operations
      4. The Hadoop Distributed File System
        1. Why HDFS?
        2. HDFS Architecture
        3. Using HDFS
      5. Running Spark on a Cluster
        1. Overview
        2. A Spark Standalone Cluster
        3. The Spark Standalone Web UI
      6. Parallel Programming with Spark
        1. RDD Partitions and HDFS Data Locality
        2. Working With Partitions
        3. Executing Parallel Operations
      7. Caching and Persistence
        1. RDD Lineage
        2. Caching Overview
        3. Distributed Persistence
      8. Writing Spark Applications
        1. Spark Applications vs. Spark Shell
        2. Creating the SparkContext
        3. Configuring Spark Properties
        4. Building and Running a Spark Application
        5. Logging
      9. Spark, Hadoop, and the Enterprise Data Center
        1. Overview
        2. Spark and the Hadoop Ecosystem
        3. Spark and MapReduce
      10. Spark Streaming
        1. Spark Streaming Overview
        2. Example: Streaming Word Count
        3. Other Streaming Operations
        4. Sliding Window Operations
        5. Developing Spark Streaming Applications
      11. Common Spark Algorithms
        1. Iterative Algorithms
        2. Graph Analysis
        3. Machine Learning
      12. Improving Spark Performance
        1. Shared Variables: Broadcast Variables
        2. Shared Variables: Accumulators
        3. Common Performance Issues

Back to Top

Do you have the right background for Cloudera Developer Training for Apache Spark?

Skills Assessment

We ensure your success by asking all students to take a FREE Skill Assessment test. These short, instructor-written tests are an objective measure of your current skills that help us determine whether or not you will be able to meet your goals by attending this course at your current skill level. If we determine that you need additional preparation or training in order to gain the most value from this course, we will recommend cost-effective solutions that you can use to get ready for the course.

Our required skill-assessments ensure that:

  1. All students in the class are at a comparable skill level, so the class can run smoothly without beginners slowing down the class for everyone else.
  2. NetCom students enjoy one of the industry's highest success rates, and pass rates when a certification exam is involved.
  3. We stay committed to providing you real value. Again, your success is paramount; we will register you only if you have the skills to succeed.
This assessment is for your benefit and best taken without any preparation or reference materials, so your skills can be objectively measured.

Take your FREE Skill Assessment test »

Back to Top

Award winning, world-class Instructors

Jose P.
Jose Marcial Portilla has a BS and MS in Mechanical Engineering from Santa Clara University. He has a great skill set in analyzing data, specifically using Python and a variety of modules and libraries. He hopes to use his experience in teaching and data science to help other people learn the power of the Python programming language and its ability to analyze data, as well as present the data in clear and beautiful visualizations. He is the creator of some of most popular Python Udemy courses including "Learning Python for Data Analysis and Visualization" and "The Complete Python Bootcamp". With almost 30,000 enrollments Jose has been able to teach Python and its Data Science libraries to thousands of students. Jose is also a published author, having recently written "NumPy Succintly" for Syncfusion's series of e-books.

See more...   See more instructors...

Back to Top

Client Testimonials & Reviews about their Learning Experience

We are passionate in delivering the best learning experience for our students and they are happy to share their learning experience with us.
Read what students had to say about their experience at NetCom.   Read student testimonials...

Back to Top