Cloudera Developer Training for MapReduce

Cloudera Developer Training for MapReduce Course Description

Duration: 4.00 days (32 hours)

This four day developer training course delivers the key concepts and expertise participants need to create robust data processing applications using Apache Hadoop. From workflow implementation and working with APIs through writing MapReduce code and executing joins, this training course is the best preparation for the real-world challenges faced by Hadoop developers.

Next Class Dates

Contact us to customize this class with your own dates, times and location. You can also call 1-888-563-8266 or chat live with a Learning Consultant.

Back to Top

Intended Audience for this Cloudera Developer Training for MapReduce Course

  • » This course is best suited to developers and engineers who have programming experience. Knowledge of Java is strongly recommended and is required to complete the hands-on exercises.

Back to Top

Cloudera Developer Training for MapReduce Course Objectives

  • » The internals of MapReduce and HDFS and how to write MapReduce code
  • » Best practices for Hadoop development, debugging, and implementation of workflows and common algorithms
  • » How to leverage Hive, Pig, Sqoop, Flume, Oozie, and other Hadoop ecosystem projects
  • » Creating custom components such as WritableComparables and InputFormats to manage complex data types
  • » Writing and executing joins to link data sets in MapReduce
  • » Advanced Hadoop API topics required for real-world data analysis

Back to Top

Cloudera Developer Training for MapReduce Course Outline

      1. The Motivation For Hadoop
        1. Problems with Traditional Large-Scale Systems
        2. Introducing Hadoop
        3. Hadoopable Problems
      2. Hadoop: Basic Concepts and HDFS
        1. The Hadoop Project and Hadoop Components
        2. The Hadoop Distributed File System
      3. Introduction to MapReduce
        1. MapReduce Overview
        2. Example: WordCount
        3. Mappers
        4. Reducers
      4. Hadoop Clusters and the Hadoop Ecosystem
        1. Hadoop Cluster Overview
        2. Hadoop Jobs and Tasks
        3. Other Hadoop Ecosystem Components
      5. Writing a MapReduce Program in Java
        1. Basic MapReduce API Concepts
        2. Writing MapReduce Drivers, Mappers, and Reducers in Java
        3. Speeding Up Hadoop Development by Using Eclipse
        4. Differences Between the Old and New MapReduce APIs
      6. Writing a MapReduce Program Using Streaming
        1. Writing Mappers and Reducers with the Streaming API
      7. Unit Testing MapReduce Programs
        1. Unit Testing
        2. The JUnit and MRUnit Testing Frameworks
        3. Writing Unit Tests with MRUnit
        4. Running Unit Tests
      8. Delving Deeper into the Hadoop API
        1. Using the ToolRunner Class
        2. Setting Up and Tearing Down Mappers and reducers
        3. Decreasing the Amount of Intermediate
        4. Data with Combiners
        5. Accessing HDFS Programmatically
        6. Using The Distributed Cache
        7. Using the Hadoop API--s Library of Mappers, Reducers, and Partitioners
      9. Practical Development Tips and Techniques
        1. Strategies for Debugging MapReduce Code
        2. Testing MapReduce Code Locally by Using LocalJobRunner
        3. Writing and Viewing Log Files
        4. Retrieving Job Information with Counters
        5. Reusing Objects
        6. Creating Map-Only MapReduce Jobs
      10. Partitioners and Reducers
        1. How Partitioners and Reducers Work Together
        2. Determining the Optimal Number of Reducers for a Job
        3. Writing Customer Partitioners
      11. Data Input and Output
        1. Creating Custom Writable and Writable
        2. Comparable Implementations
        3. Saving Binary Data Using SequenceFile and Avro Data Files
        4. Issues to Consider When Using File Compression
        5. Implementing Custom InputFormats and OutputFormats
      12. Common MapReduce Algorithms
        1. Sorting and Searching Large Data Sets
        2. Indexing Data
        3. Computing Term Frequency -- Inverse Document Frequency
        4. Calculating Word Co-Occurrence
        5. Performing Secondary Sort
      13. Joining Data Sets in MapReduce Jobs
        1. Writing a Map-Side Join
        2. Writing a Reduce-Side Join
        3. Integrating Hadoop into the Enterprise Workflow
      14. Integrating Hadoop into an Existing Enterprise
        1. Loading Data from an RDBMS into HDFS by Using Sqoop
        2. Managing Real-Time Data Using Flume
        3. Accessing HDFS from Legacy Systems with FuseDFS and HttpFS
      15. An Introduction to Hive, Imapala, and Pig
        1. The Motivation for Hive, Impala, and Pig
        2. Hive Overview
        3. Impala Overview
        4. Pig Overview
        5. Choosing Between Hive, Impala, and Pig
      16. An Introduction to Oozie
        1. Introduction to Oozie
        2. Creating Oozie Workflows

Back to Top

Do you have the right background for Cloudera Developer Training for MapReduce?

Skills Assessment

We ensure your success by asking all students to take a FREE Skill Assessment test. These short, instructor-written tests are an objective measure of your current skills that help us determine whether or not you will be able to meet your goals by attending this course at your current skill level. If we determine that you need additional preparation or training in order to gain the most value from this course, we will recommend cost-effective solutions that you can use to get ready for the course.

Our required skill-assessments ensure that:

  1. All students in the class are at a comparable skill level, so the class can run smoothly without beginners slowing down the class for everyone else.
  2. NetCom students enjoy one of the industry's highest success rates, and pass rates when a certification exam is involved.
  3. We stay committed to providing you real value. Again, your success is paramount; we will register you only if you have the skills to succeed.
This assessment is for your benefit and best taken without any preparation or reference materials, so your skills can be objectively measured.

Take your FREE Skill Assessment test »

Back to Top

Award winning, world-class Instructors

Jose P.
Jose Marcial Portilla has a BS and MS in Mechanical Engineering from Santa Clara University. He has a great skill set in analyzing data, specifically using Python and a variety of modules and libraries. He hopes to use his experience in teaching and data science to help other people learn the power of the Python programming language and its ability to analyze data, as well as present the data in clear and beautiful visualizations. He is the creator of some of most popular Python Udemy courses including "Learning Python for Data Analysis and Visualization" and "The Complete Python Bootcamp". With almost 30,000 enrollments Jose has been able to teach Python and its Data Science libraries to thousands of students. Jose is also a published author, having recently written "NumPy Succintly" for Syncfusion's series of e-books.

See more...   See more instructors...

Back to Top

Client Testimonials & Reviews about their Learning Experience

We are passionate in delivering the best learning experience for our students and they are happy to share their learning experience with us.
Read what students had to say about their experience at NetCom.   Read student testimonials...

Back to Top