This course provides a comprehensive introduction to designing and implementing big data solutions on Microsoft Azure. The course covers all the major steps common to any analytics pipeline from ingest, to processing, storage and analysis. Coverage includes designing for the range of processing options found in analytics solutions- batch processing, interactive analytics and real-time processing. Security, encryption and data governance capabilities are also covered.
With the major components understood, the course then turns to adding intelligence to the pipeline through the application of machine learning. The course concludes with the options available to operationalize the end-to-end analytics solutions.
What You Will Learn
- Understand the key capabilities of several Azure Data, Storage, Analytics and Intelligence services
- Understand the core storage services including Data Lake Store, Blob Storage, HDFS, Event Hubs and IoT Hubs
- Understand core processing services including HDInsight, Stream Analytics, SQL Data Warehouse and Data Lake Analytics
- Understand how to operationalize data pipelines with Data Factory
- Understand common architectures including Lambda and Kappa architectures
- Understand how to manage and secure the data solution
Module 1: Overview of the Azure Analytics Platform
In this module, students will learn the basics of analytics pipeline terminology and where the Microsoft Azure services fit. This module introduces the Lambda Architecture, which is used as a reference architecture for building an analytics data pipeline.
Module 2: Bulk and relational ingest
In this module, students will be introduced to the various tools and protocols available for the loading of data from bulk and relational sources for ingestion into an Azure based analytics pipeline.
Module 3: Ingest storage
In this module, students will be introduced to the Microsoft Azure services that support batch storage of ingested data: Azure Storage Blobs, Data Lake Store and HDFS.
Module 4: Batch Processing
In this module, students will be introduced to some of the services offered by Microsoft Azure that support the batch processing of data at scale. Topics include the application of HDInsight to perform batch processing the MapReduce, Tez and Spark. Similarly, SQL Data Warehouse is introduced to support processing of data present in batch storage.
Module 5: Interactive Processing & Querying
In this module, students will be introduced to the services which enable lower latency, interactive querying of big data. Students will learn various options for querying data using SQL. Service covered include Azure SQL Data Warehouse, HDInsight with Spark SQL, HDInsight with HBase/Phoenix and performing analytics with Data Lake Analytics with USQL.
Module 6: Real-Time Ingest & Storage
In this module, students will learn about the protocols for real-time ingest including HTTP, AMQP and MQTT and the storage of data received using queue based services including Event Hubs and IoT Hub.
Module 7: Real-time Processing
In this module, the student will learn about different services and capabilities of Azure for processing ingested real-time data. Key concepts such as tuple-at-time and micro-batch processing are introduced. Services covered include HDInsight with Apache Storm, HDInsight with Storm/Trident, HDInsight with Spark Streaming, Web Jobs, Azure Functions, and Stream Analytics.
Module 8: Intelligence & Machine Learning
In this module, student will understand the fundamentals of machine learning using Azure Machine Learning. Covered topics include ML Studio, Training Experiments, Predictive Experiments and operationalizing experiments with Web Services and Cortana Intelligence components.
Module 9: Data Pipelines
This module will help the student pull all the pieces together into a pipeline managed under a single pane of glass by using Azure Data Factory.
Module 10: Security & Governance
In this concluding module, the student will look horizontally across the data pipeline to understand how to secure the data at rest and in transit, as well to enable governance and discovery with services such as Azure Data Catalog.