Cloudera Data Analyst Training: Using Pig, Hive, and Impala with Hadoop

Cloudera Data Analyst Training: Using Pig, Hive, and Impala with Hadoop Course Description

Duration: 4.00 days (32 hours)

This four-day data analyst training course focusing on Apache Pig and Hive and Cloudera Impala will teach you to apply traditional data analytics and business intelligence skills to big data. Cloudera presents the tools data professionals need to access, manipulate, transform, and analyze complex data sets using SQL and familiar scripting languages.

Next Class Dates

Contact us to customize this class with your own dates, times and location. You can also call 1-888-563-8266 or chat live with a Learning Consultant.

Back to Top

Intended Audience for this Cloudera Data Analyst Training: Using Pig, Hive, and Impala with Hadoop Course

  • » This course is designed for data analysts, business intelligence specialists, developers, system architects, and database administrators.

Back to Top

Cloudera Data Analyst Training: Using Pig, Hive, and Impala with Hadoop Course Objectives

  • » The features that Pig, Hive, and Impala offer for data acquisition, storage, and analysis
  • » The fundamentals of Apache Hadoop and data ETL (extract, transform, load), ingestion, and processing with Hadoop
  • » How Pig, Hive, and Impala improve productivity for typical analysis tasks
  • » Joining diverse datasets to gain valuable business insight
  • » Performing real-time, complex queries on datasets

Back to Top

Cloudera Data Analyst Training: Using Pig, Hive, and Impala with Hadoop Course Outline

      1. Hadoop Fundamentals
        1. The Motivation for Hadoop
        2. Hadoop Overview
        3. Data Storage: HDFS
        4. Distributed Data Processing: YARN, MapReduce, and Spark
        5. Data Processing and Analysis: Pig, Hive, and Impala
        6. Data Integration: Sqoop
        7. Other Hadoop Data Tools
        8. Exercise Scenarios Explanation
      2. Introduction to Pig
        1. What Is Pig?
        2. Pig's Features
        3. Pig Use Cases
        4. Interacting with Pig
      3. Basic Data Analysis with Pig
        1. Pig Latin Syntax
        2. Loading Data
        3. Simple Data Types
        4. Field Definitions
        5. Data Output
        6. Viewing the Schema
        7. Filtering and Sorting Data
        8. Commonly-Used Functions
      4. Processing Complex Data with Pig
        1. Storage Formats
        2. Complex/Nested Data Types
        3. Grouping
        4. Built-In Functions for Complex Data
        5. Iterating Grouped Data
      5. Multi-Dataset Operations with Pig
        1. Techniques for Combining Data Sets
        2. Joining Data Sets in Pig
        3. Set Operations
        4. Splitting Data Sets
      6. Pig Troubleshooting and Optimization
        1. Troubleshooting Pig
        2. Logging
        3. Using Hadoop's Web UI
        4. Data Sampling and Debugging
        5. Performance Overview
        6. Understanding the Execution Plan
        7. Tips for Improving the Performance of Your Pig Jobs
      7. Introduction to Hive and Impala
        1. What Is Hive?
        2. What Is Impala?
        3. Schema and Data Storage
        4. Comparing Hive to Traditional Databases
        5. Hive Use Cases
      8. Querying with Hive and Impala
        1. Databases and Tables
        2. Basic Hive and Impala Query Language Syntax
        3. Data Types
        4. Differences Between Hive and Impala Query Syntax
        5. Using Hue to Execute Queries
        6. Using the Impala Shell
      9. Data Management
        1. Data Storage
        2. Creating Databases and Tables
        3. Loading Data
        4. Altering Databases and Tables
        5. Simplifying Queries with Views
        6. Storing Query Results
      10. Data Storage and Performance
        1. Partitioning Tables
        2. Choosing a File Format
        3. Managing Metadata
        4. Controlling Access to Data
      11. Relational Data Analysis with Hive and Impala
        1. Joining Datasets
        2. Common Built-In Functions
        3. Aggregation and Windowing
      12. Working with Impala
        1. How Impala Executes Queries
        2. Extending Impala with User-Defined Functions
        3. Improving Impala Performance
      13. Analyzing Text and Complex Data with Hive
        1. Complex Values in Hive
        2. Using Regular Expressions in Hive
        3. Sentiment Analysis and N-Grams
        4. Conclusion
      14. Hive Optimization
        1. Understanding Query Performance
        2. Controlling Job Execution Plan
        3. Bucketing
        4. Indexing Data
      15. Extending Hive
        1. SerDes
        2. Data Transformation with Custom Scripts
        3. User-Defined Functions
        4. Parameterized Queries
      16. Choosing the Best Tool for the Job
        1. Comparing MapReduce, Pig, Hive, Impala, and Relational Databases
        2. Which to Choose?

Back to Top

Do you have the right background for Cloudera Data Analyst Training: Using Pig, Hive, and Impala with Hadoop?

Skills Assessment

We ensure your success by asking all students to take a FREE Skill Assessment test. These short, instructor-written tests are an objective measure of your current skills that help us determine whether or not you will be able to meet your goals by attending this course at your current skill level. If we determine that you need additional preparation or training in order to gain the most value from this course, we will recommend cost-effective solutions that you can use to get ready for the course.

Our required skill-assessments ensure that:

  1. All students in the class are at a comparable skill level, so the class can run smoothly without beginners slowing down the class for everyone else.
  2. NetCom students enjoy one of the industry's highest success rates, and pass rates when a certification exam is involved.
  3. We stay committed to providing you real value. Again, your success is paramount; we will register you only if you have the skills to succeed.
This assessment is for your benefit and best taken without any preparation or reference materials, so your skills can be objectively measured.

Take your FREE Skill Assessment test »

Back to Top

Award winning, world-class Instructors

Jose P.
Jose Marcial Portilla has a BS and MS in Mechanical Engineering from Santa Clara University. He has a great skill set in analyzing data, specifically using Python and a variety of modules and libraries. He hopes to use his experience in teaching and data science to help other people learn the power of the Python programming language and its ability to analyze data, as well as present the data in clear and beautiful visualizations. He is the creator of some of most popular Python Udemy courses including "Learning Python for Data Analysis and Visualization" and "The Complete Python Bootcamp". With almost 30,000 enrollments Jose has been able to teach Python and its Data Science libraries to thousands of students. Jose is also a published author, having recently written "NumPy Succintly" for Syncfusion's series of e-books.

See more...   See more instructors...

Back to Top

Client Testimonials & Reviews about their Learning Experience

We are passionate in delivering the best learning experience for our students and they are happy to share their learning experience with us.
Read what students had to say about their experience at NetCom.   Read student testimonials...

Back to Top