Data Engineering on Google Cloud Platform

Data Engineering on Google Cloud Platform Course Description

Duration: 4.00 days (32 hours)

This four-day instructor-led class provides you with a hands-on introduction to designing and building data processing systems on Google Cloud Platform. Through a combination of presentations, demos, and hand-on labs, you will learn how to design data processing systems, build end-to-end data pipelines, analyze data and carry out machine learning. The course covers structured, unstructured, and streaming data.

Next Class Dates

Feb 19, 2018 – Feb 22, 2018
8:30 AM – 4:30 PM ET
519 8th Avenue, 2nd Floor, New York, NY 10018
New York, NY 10018
Mar 19, 2018 – Mar 22, 2018
8:30 AM – 4:30 PM ET
519 8th Avenue, 2nd Floor, New York, NY 10018
New York, NY 10018

View More Schedules »

Contact us to customize this class with your own dates, times and location. You can also call 1-888-563-8266 or chat live with a Learning Consultant.

Back to Top

Intended Audience for this Data Engineering on Google Cloud Platform Course

  • » Extracting, loading, transforming, cleaning, and validating data
  • » Designing pipelines and architectures for data processing
  • » Creating and maintaining machine learning and statistical models
  • » Querying datasets, visualizing query results and creating reports

Back to Top

Course Prerequisites for Data Engineering on Google Cloud Platform

  • » Completed Google Cloud Fundamentals- Big Data and Machine Learning course #8325 OR have equivalent experience
  • » Basic proficiency with common query language such as SQL
  • » Experience with data modeling, extract, transform, load activities
  • » Developing applications using a common programming language such Python
  • » Familiarity with Machine Learning and/or statistics

Back to Top

Data Engineering on Google Cloud Platform Course Objectives

  • » Design and build data processing systems on Google Cloud Platform
  • » Process batch and streaming data by implementing autoscaling data pipelines on Cloud Dataflow
  • » Derive business insights from extremely large
  • » datasets using Google BigQuery
  • » Train, evaluate and predict using machine learning models using Tensorflow and Cloud ML
  • » Leverage unstructured data using Spark and ML APIs on Cloud Dataproc
  • » Enable instant insights from streaming data

Back to Top

Data Engineering on Google Cloud Platform Course Outline

      1. Serverless Data Analysis with BigQuery
        1. What is BigQuery
        2. Advanced Capabilities
        3. Performance and pricing
      2. Serverless, Autoscaling Data Pipelines with Dataflow
      3. Getting Started with Machine Learning
        1. What is machine learning (ML)
        2. Effective ML: concepts, types
        3. Evaluating ML
        4. ML datasets: generalization
      4. Building ML Models with Tensorflow
        1. Getting started with TensorFlow
        2. TensorFlow graphs and loops + lab
        3. Monitoring ML training
      5. Scaling ML Models with CloudML
        1. Why Cloud ML?
        2. Packaging up a TensorFlow model
        3. End-to-end training
      6. Feature Engineering
        1. Creating good features
        2. Transforming inputs
        3. Synthetic features
        4. Preprocessing with Cloud ML
      7. ML Architectures
        1. Wide and deep
        2. Image analysis
        3. Embeddings and sequences
        4. Recommendation systems
      8. Google Cloud Dataproc Overview
        1. Introducing Google Cloud Dataproc
        2. Creating and managing clusters
        3. Defining master and worker nodes
        4. Leveraging custom machine types and preemptible worker nodes
        5. Creating clusters with the Web Console
        6. Scripting clusters with the CLI
        7. Using the Dataproc REST API
        8. Dataproc pricing
        9. Scaling and deleting Clusters
      9. Running Dataproc Jobs
        1. Controlling application versions
        2. Submitting jobs
        3. Accessing HDFS and GCS
        4. Hadoop
        5. Spark and PySpark
        6. Pig and Hive
        7. Logging and monitoring jobs
        8. Accessing onto master and worker nodes with SSH
        9. Working with PySpark REPL (command-line interpreter)
      10. Integrating Dataproc with Google Cloud Platform
        1. Initialization actions
        2. Programming Jupyter/Datalab notebooks
        3. Accessing Google Cloud Storage
        4. Leveraging relational data with Google Cloud SQL
        5. Reading and writing streaming Data with Google BigTable
        6. Querying Data from Google BigQuery
        7. Making Google API Calls from notebooks
      11. Making Sense of Unstructured Data with Google’s Machine Learning APIs
        1. Google’s Machine Learning APIs
        2. Common ML Use Cases
        3. Vision API
        4. Natural Language API
        5. Translate
        6. Speech API
      12. Need for Real-Time Streaming Analytics
        1. What is Streaming Analytics?
        2. Use-cases
        3. Batch vs. Streaming (Real-time)
        4. Related terminologies
        5. GCP products that help build for high availability, resiliency, high-throughput, real-timestreaming analytics (review of Pub/Sub and Dataflow)
      13. Architecture of Streaming Pipelines
        1. Streaming architectures and considerations
        2. Choosing the right components
        3. Windowing
        4. Streaming aggregation
        5. Events, triggers
      14. Stream Data and Events into PubSub
        1. Topics and Subscriptions
        2. Publishing events into Pub/Sub
        3. Subscribing options: Push vs Pull
        4. Alerts
      15. Build a Stream Processing Pipeline
        1. Pipelines, PCollections and Transforms
        2. Windows, Events, and Triggers
        3. Aggregation statistics
        4. Streaming analytics with BigQuery
        5. Low-volume alerts
      16. High Throughput and Low-Latency with Bigtable
        1. Latency considerations
        2. What is Bigtable
        3. Designing row keys
        4. Performance considerations
      17. High Throughput and Low-Latency with Bigtable
        1. What is Google Data Studio?
        2. From data to decisions

Back to Top

Do you have the right background for Data Engineering on Google Cloud Platform?

Skills Assessment

We ensure your success by asking all students to take a FREE Skill Assessment test. These short, instructor-written tests are an objective measure of your current skills that help us determine whether or not you will be able to meet your goals by attending this course at your current skill level. If we determine that you need additional preparation or training in order to gain the most value from this course, we will recommend cost-effective solutions that you can use to get ready for the course.

Our required skill-assessments ensure that:

  1. All students in the class are at a comparable skill level, so the class can run smoothly without beginners slowing down the class for everyone else.
  2. NetCom students enjoy one of the industry's highest success rates, and pass rates when a certification exam is involved.
  3. We stay committed to providing you real value. Again, your success is paramount; we will register you only if you have the skills to succeed.
This assessment is for your benefit and best taken without any preparation or reference materials, so your skills can be objectively measured.

Take your FREE Skill Assessment test »

Back to Top

Award winning, world-class Instructors

Erick P.
- In-depth experience in all phases of project lifecycle: requirements gathering, specifications, development and team management, testing, end user training, and maintenance in addition to .NET, ASP, ADO, SQL, JavaScript, and SharePoint.
- Developed the first online multimedia training content system to Harvard University as well as multiple online multimedia projects for the North Carolina State Government.
- Highly rated instructor averaging 8.7 out of 9 on evaluation reports.


Erick has been training business and IT professionals since 1989, when he developed and introduced the first online multimedia training content system to Harvard University. Since then he has honed his business, programming, and database skills providing highly customized software solutions and education programs for multiple clients such as North Carolina State Government, Cisco, IBM, and Time Warner Cable.

Erick's teaching prowess and real-world experience leading a team of software application developers make him a top Instructor and Subject Matter Expert at NetCom Learning, where he averages 8.7 out of 9 on evaluation reports.
Sam P.
- Team leader for the first undergraduate team to win the Duke Startup Challenge.
- Over 15 years of experience in the IT industry.
- NetCom Learning Instructor of the Year 2011.


Sam Polsky has spent his entire career in entrepreneurial pursuits, including such fields as biotechnology, software development, data management, and business process management. He began in entrepreneurship as team leader for the first undergraduate team to win the Duke Startup Challenge, a business development competition geared towards Duke Universitys various graduate schools.

Sam Polsky has since co-founded a consulting firm where he has been involved in software architecture, development and implementation. On top of that, Sam has been delivering acclaimed solutions in software architecture, development and implementation for over 15 years. He is a much-admired Subject Matter Expert and Trainer at NetCom Learning and was voted NetCom Learning Instructor of the Year 2011
Amanpreet M.

See more...   See more instructors...

Back to Top

Client Testimonials & Reviews about their Learning Experience

We are passionate in delivering the best learning experience for our students and they are happy to share their learning experience with us.
Read what students had to say about their experience at NetCom.   Read student testimonials...

Back to Top