Building Batch Data Analytics Solutions on AWS

In this course, you will learn to build batch data analytics solutions using Amazon EMR, an enterprise-grade Apache Spark and Apache Hadoop managed service.

Description

In the Building Batch Data Analytics Solutions on AWS course, you will review Amazon EMR integrates with open-source projects such as Apache Hive, Hue, and HBase, and with AWS services such as AWS Glue and AWS Lake Formation. The course addresses data collection, ingestion, cataloging, storage, and processing components in the context of Spark and Hadoop. You will learn to use EMR Notebooks to support both analytics and machine learning workloads. You will also learn to apply security, performance, and cost management best practices to the operation of Amazon EMR.

Course Content

Module A: Overview of Data Analytics and the Data Pipeline

Data analytics use cases
Using the data pipeline for analytics

Module 1: Introduction to Amazon EMR

Using Amazon EMR in analytics solutions
Amazon EMR cluster architecture
Interactive Demo 1: Launching an Amazon EMR cluster
Cost management strategies

Module 2: Data Analytics Pipeline Using Amazon EMR: Ingestion and Storage

Storage optimization with Amazon EMR
Data ingestion techniques

Module 3: High-Performance Batch Data Analytics Using Apache Spark on Amazon EMR

Apache Spark on Amazon EMR use cases
Why Apache Spark on Amazon EMR
Spark concepts
Interactive Demo 2: Connect to an EMR cluster and perform Scala commands using the Spark shell
Transformation, processing, and analytics
Using notebooks with Amazon EMR

Module 4: Processing and Analyzing Batch Data with Amazon EMR and Apache Hive

Using Amazon EMR with Hive to process batch data
Transformation, processing, and analytics
Introduction to Apache HBase on Amazon EMR

Module 5: Serverless Data Processing

Serverless data processing, transformation, and analytics
Using AWS Glue with Amazon EMR workloads
Practice Lab 3: Orchestrate data processing in Spark using AWS Step Functions

Module 6: Security and Monitoring of Amazon EMR Clusters

Securing EMR clusters
Interactive Demo 3: Client-side encryption with EMRFS
Monitoring and troubleshooting Amazon EMR clusters
Demo: Reviewing Apache Spark cluster history

Module 7: Designing Batch Data Analytics Solutions

Batch data analytics use cases
Activity: Designing a batch data analytics workflow

Module B: Developing Modern Data Architectures on AWS

Modern data architectures

Prerequisites

Students with a minimum one-year experience managing open-source data frameworks such as Apache Spark or Apache Hadoop will benefit from this course.

Similar courses

Using Data Analysis Expressions to solve common business problems in Power BI

More Information

Analyze business data, visualize insights, and share those insights across the enterprise

More Information

In this course, you will perform advanced data visualization and data blending with Tableau.

More Information

The Microsoft Power Platform helps organizations optimize their operations by simplifying, automating and transforming business tasks and processes.

More Information

In this course, you will implement and administer networks by using Cisco solutions.

More Information

CompTIA A+ is the industry standard for launching IT careers into today’s digital world. CompTIA A+ Core 1 (Exam 220-1101) covers mobile devices, networking technology, hardware, virtualization, cloud computing and network troubleshooting.

More Information

This course is designed to create PL/SQL blocks both anonymous and named. This course will cover PL/SQL objects and data types. It will also cover packages and how to debug and improve performance within PL/SQL. It will address deploying PL/SQL objects and using Oracle pre-define packages, procedures, and functions.

More Information

In this course, you will understand and use Agile core terms, explain key Agile concepts and their importance in achieving agility, identify, engage, and leverage key stakeholders in an Agile environment, apply common Agile tools and techniques, embrace and advocate for an Agile mindset to benefit from an Agile approach, select the best practices for a project and apply them appropriately to benefit the project and organization.

More Information

This course introduces tools and tactics to manage cybersecurity risks, identify various types of common threats, evaluate the organization's security, collect and analyze cybersecurity intelligence, and handle incidents as they occur.

More Information

Students who attend this course will leave armed with new skills to leverage modules, scale applications into multi-core environments, and improve the performance of Java 9 applications. This course will teach students everything they need to successfully master and implement the latest features and benefits of Java 9 and become a more effective Java 9 developer.

More Information

In this course you will, explain core programming fundamentals such as computer storage and processing, create and use variables and constants in programs, discuss how to create and use functions in a program, use decisions structures in a computer program, create and use repetition (loops) in a computer program, explain pseudocode and its role in programming, implement object-oriented programming concepts, and identify application errors and explain how to debug an application and handle errors.

More Information

R is a functional programming environment for business analysts and data scientists. It's a language that many non-programmers can easily work with, naturally extending a skill set that is common to high-end Excel users. It's the perfect tool for when the analyst has a statistical, numerical, or probabilities-based problem based on real data and they've pushed Excel past its limits.

More Information

In this course, you will create single page web applications using the MVC pattern of AngularJS, understand the programming model provided by the AngularJS framework, define Angular controllers and directives, and control Angular data bindings.

More Information

In this course, you will develop single page Angular applications using Typescript, set up a complete Angular development environment, create components, directives, services, pipes, forms and custom validators, handle advanced network data retrieval tasks using observables, consume data from REST web services using the Angular HTTP Client, handle push-data connections using the WebSockets protocol, work with Angular Pipes to format data, and use advanced Angular Component Router features.

More Information

This course is designed for people who want to learn the Python programming language in preparation for using Python to develop software for a wide range of applications, such as data science, machine learning, artificial intelligence, and web development.

More Information

This course teaches concepts by deep-dive on-hand exercises. Throughout the course, you will learn data wrangling with hands-on exercises and activities. You’ll find checklists, best practices, and critical points mentioned throughout the lessons, making things more interesting.

More Information

In this course, you will apply recommended practices for effective and efficient automation with Ansible, perform automation operations as rolling updates, use advanced features of Red Hat Ansible Automation Platform to work with data, including filters and plugins, create automation execution environments to contain and scale Red Hat Ansible Automation, and leverage capabilities of the automation content navigator to develop Ansible Playbooks.

More Information

In this course, you will build a data analytics solution using Amazon Redshift, a cloud data warehouse service.

More Information

In this course, you will learn new concepts, strategies, and best practices for designing a cloud-based data warehousing solution using Amazon Redshift, the petabyte-scale data warehouse in AWS.

More Information

In this course, you will learn how to build an operational data lake that supports analysis of both structured and unstructured data. You will learn the components and functionality of the services involved in creating a data lake. You will use AWS Lake Formation to build a data lake, AWS Glue to build a data catalog, and Amazon Athena to analyze data. The course lectures and labs further your learning with the exploration of several common data lake architectures.

More Information

This course is designed to teach those in a systems administrator or Development Operations (DevOps) role how to create automatable and repeatable deployments of networks and systems on the AWS platform. The course covers the specific AWS features and tools related to configuration and deployment, in addition to best practices for configuring and deploying systems.

More Information

This course explores how to use the machine learning (ML) pipeline to solve a real business problem in a project-based learning environment.

More Information

In this course, you will learn the most common DevOps patterns to develop, deploy, and maintain applications on the AWS platform. We will explore the core principles of the DevOps methodology and examine a number of use cases applicable to startup, small- to medium-sized business, and enterprise development scenarios.

More Information

In this course you will describe key database concepts in the context of SQL Server, characterize database languages used in SQL Server, describe data modeling techniques, discuss normalization and denormalization techniques, distinguish relationship types and effects in database design, describe the effects of database design on performance, and define commonly used database objects.

More Information

In this course, you will create single table SELECT queries, create multiple table SELECT queries, insert, update, and delete data, query data using built-in functions, create queries that aggregate data, create subqueries, create queries that use table expressions, use UNION, INTERSECT, and EXCEPT on multiple sets of data, implement window functions in queries, use PIVOT and GROUPING SETS in queries, use stored procedures in queries, add error handling to queries, and use transactions in queries.

More Information

In this course you will, create sophisticated SSIS packages for extracting, transforming, and loading data, use containers to efficiently control repetitive tasks and transactions, configure packages to dynamically adapt to environment changes, use Data Quality Services to cleanse data, successfully troubleshoot packages, create and manage the SSIS Catalog, deploy, configure, and schedule packages, secure the SSIS Catalog.

More Information

In this course, you will expand your Python proficiencies, select an object-oriented programming approach for Python applications, create object-oriented Python applications, create a desktop application, create data-driven applications, create and secure web service-connected applications, program Python for data science, implement unit testing and exception handling, and package an application for distribution.

More Information

This course will teach you the fundamentals of programming in R to get you started. It will also teach you how to use R to perform common data science tasks and achieve data-driven results for the business.

More Information

In this course, you will develop web content in HTML, enhance its formatting and layout using CSS, and add interactivity using JavaScript.

More Information

In this course, you will learn how to leverage AWS data Services to store, process, analyze, stream, and query data to make decisions with speed and agility at scale, how to modernize data solutions end to end, and obtain skills to put your data to work to make better, more informed decisions, respond faster to the unexpected, and uncover new opportunities.

More Information

In this course, you will practice and deploy serverless solutions on AWS.

More Information

In this course, you will learn to accelerate the process to prepare, build, train, deploy, and monitor ML solutions using Amazon SageMaker Studio.

More Information

The creation of data-backed visualizations is a key way data scientists, or any professional, can explore, analyze, and report insights and trends from data. Tableau® software is designed for this purpose. Tableau was built to connect to a wide range of data sources and allows users to quickly create visualizations of connected data to gain insights, show trends, and create reports.

More Information

This course will build on existing knowledge from the CompTIA A+ 1101 course. This course will provide fundamental level skills and theoretical concepts to prepare you for real-world experiences on the job as a technician. CompTIA A+ is the industry standard for launching IT careers into today’s digital world. CompTIA A+ Core 2 (Exam 220-1102) covers installing and configuring operating systems, expanded security, software troubleshooting and operational procedures.

More Information

This course teaches Azure Solution Architects how to design infrastructure solutions. Course topics cover governance, compute, application architecture, storage, data integration, authentication, networks, business continuity, and migrations. The course combines lecture with case studies to demonstrate basic architect design principles.

More Information

CompTIA Cloud+ is a global certification that validates the skills needed to deploy and automate secure cloud environments that support the high availability of business systems and data. It is ideal for cloud engineers who need to have expertise across multiple products and systems. CompTIA Cloud+ is the only cloud focused certification approved for DoD 8570.01-M, offering an infrastructure option for individuals who need to certify in IAM Level I, CSSP Analyst and CSSP Infrastructure Support roles.

More Information

This course provides students with the knowledge and skills to administer a SQL Server database infrastructure for cloud, on-premises and hybrid relational databases and who work with the Microsoft PaaS relational database offerings. Additionally, it will be of use to individuals who develop applications that deliver content from SQL-based relational databases.

More Information

In this 5-day course, you will learn day-to-day management tasks, including how to manage applications, client health, hardware and software inventory, operating system deployment, and software updates by using Configuration Manager. You also will learn how to optimize Endpoint Protection, manage compliance, and create management queries and reports. Although this course and the associated labs are written for Windows Server 2022, the skills taught will also be backwards compatible for Server 2016 and 2019.

More Information

In this course, you will develop and deploy VBA modules to solve business problems.

More Information

CompTIA Security+ is a global certification that validates the foundational cybersecurity skills necessary to perform core security functions and pursue an IT security career. It establishes the core knowledge required of any cybersecurity role and provides a springboard to intermediate-level cybersecurity jobs. CompTIA Security+ is compliant with ISO 17024 standards and approved by the U.S. DoD to meet Directive 8140.03M requirements.

More Information

Gain fundamental knowledge and skills to use PowerShell for administering and automating administration of Windows servers.

More Information

Learn how to make SharePoint online relevant to your team by using a sites functionality to help you share information and collaborate with your colleagues.

More Information

If you are someone with existing SQL or SQL Server knowledge (or someone highly versed in different data repositories), this is the Power BI course for you. This course covers the various methods and best practices that are in line with business and technical requirements for modeling, visualizing, and analyzing data with Power BI.

More Information

This course provides detailed information on the architecture of an Oracle Database instance and database, enabling you to manage your database resources effectively. You learn how to create database storage structures appropriate for the business applications supported by your database. In addition, you learn how to create users and administer database security to meet your business requirements. This course provides basic information on backup and recovery techniques.

More Information

This introductory and beyond level course is for technical users newer to Python who want to learn advanced data handling and transformation skills, using the latest tools and techniques. The course is approximately 50% hands-on to 50% lecture ratio, combining expert lecture, real-world demonstrations and group discussions with machine-based practical labs and exercises. Student machines are required.

More Information

CompTIA Data+ is an early-career data analytics certification for professionals tasked with developing and promoting data-driven business decision-making that gives learners the confidence to bring data analysis to life.

More Information

In this course, you will compose SQL queries to retrieve desired information from a database.

More Information

In this course, you will work with advanced queries to manipulate and index tables. You will also create transactions so that you can choose to save or cancel the data entry process.

More Information

This 2-day entry-level course examines the services and features of Microsoft SQL 2022. (This is NOT a SQL querying course, SQL Querying syntax will not be discussed). The content focuses on database tables, adding and changing data, creating and using stored procedures, entity relationships, and indexes.

More Information

Doing data analysis work is about more than learning a software program (Excel, Power BI, Tableau, etc.) - you need to understand the concepts and theory too. This one day course gets you up to speed (and can be useful either before or after your software classes).

More Information

In this course, you will use various Python tools to load, analyze, manipulate, and visualize business data.

More Information

ITIL 4 is the next evolution of ITIL, providing a practical and flexible transition that allows organizations to adopt the new ways of working required by the modern digital world. It provides an end-to-end IT/digital operating model for the delivery and operation of tech-enabled products and services and enables IT teams to continue to play a crucial role in wider business strategy.

More Information

In this course, students will create complex reports & data sources using the tools in Crystal Reports 2020. Students will not only create more complex reports including sub-reports and cross-tabs, but will also increase their speed and efficiency.

More Information

In this course, students will create a basic report by connecting to a database and modifying the report's presentation.

More Information

This five-day instructor-led course teaches IT professionals the fundamental administration skills required to deploy and support Windows Server in most organizations. It is designed primarily for IT professionals who have some experience with Windows Server and will be responsible for managing identity, networking, storage and compute by using Windows Server, and who need to understand the scenarios, requirements, and options that are available and applicable to Windows Server.

More Information

This four-day course is intended for Windows Server Hybrid Administrators who have experience working with Windows Server and want to extend the capabilities of their on-premises environments by combining on-premises and hybrid technologies. Windows Server Hybrid Administrators implement and manage on-premises and hybrid solutions such as identity, management, compute, networking, and storage in a Windows Server hybrid environment.

More Information

Our Exam Cram sessions are intensive, focused review sessions designed to help your team master key concepts and pass their CompTIA certification exams with confidence. Led by expert instructors, these sessions provide in-depth, targeted hands-on practice to ensure your team is fully prepared for exam day. CompTIA A+ Core 1 covers mobile devices, networking technology, hardware, virtualization and cloud computing and network troubleshooting. This exam cram session is included with the A+ Core 1 course.

More Information

Our Exam Cram sessions are intensive, focused review sessions designed to help your team master key concepts and pass their CompTIA certification exams with confidence. Led by expert instructors, these sessions provide in-depth, targeted hands-on practice to ensure your team is fully prepared for exam day. CompTIA A+ Core 2 covers installing and configuring operating systems, expanded security, software troubleshooting and operational procedures. This exam cram session is included with the A+ Core 2 course.

More Information

CompTIA Network+ certification exam covers the latest trends in networking and validates the core skills necessary to establish, maintain, troubleshoot and secure networks in any environment, preparing learners for a rewarding career in networking and cybersecurity. Students gain a wide range of technical and hands-on skills required of today’s early-career network administrators. Network+ is Approved for DoD 8140.03.

More Information

Our Exam Cram sessions are intensive, focused review sessions designed to help your team master key concepts and pass their CompTIA certification exams with confidence. Led by expert instructors, these sessions provide in-depth, targeted hands-on practice to ensure your team is fully prepared for exam day. Network+ exam covers the core skills necessary to establish, maintain, troubleshoot and secure networks regardless of technology or platform. This exam cram session is included with the Network+ course.

More Information

Our Exam Cram sessions are intensive, focused review sessions designed to help your team master key concepts and pass their CompTIA certification exams with confidence. Led by expert instructors, these sessions provide in-depth, targeted hands-on practice to ensure your team is fully prepared for exam day. Security+ covers the most in-demand skills related to current threats, automation, zero trust, IoT, risk – and more. This exam cram session is included with the Security+ course.

More Information

Our Exam Cram sessions are intensive, focused review sessions designed to help your team master key concepts and pass their CompTIA certification exams with confidence. Led by expert instructors, these sessions provide in-depth, targeted hands-on practice to ensure your team is fully prepared for exam day. Cloud+ covers the expertise needed to deploy and automate secure cloud environments and protect mission-critical applications and data. This exam cram session is included with the Cloud+ course.

More Information

Our Exam Cram sessions are intensive, focused review sessions designed to help your team master key concepts and pass their CompTIA certification exams with confidence. Led by expert instructors, these sessions provide in-depth, targeted hands-on practice to ensure your team is fully prepared for exam day. Cloud+ covers mining and manipulating data, applying basic statistical methods, and analyzing complex datasets. This exam cram session is included with the Data+ course.

More Information

This course is designed for professionals in a variety of job roles who are currently using desktop or web-based data management tools such as Microsoft® Excel® or SQL Server® reporting services to perform numerical or general data analysis. This course is also designed for professionals who want to pursue the Microsoft Power BI Data Analyst (Exam PL-300) certification.

More Information

This five-day course teaches you advanced skills for configuring and maintaining a highly available and scalable virtual infrastructure. Through a mix of lecture and hands-on labs, you configure and optimize the VMware vSphere 8 features that build a foundation for a truly scalable infrastructure. You also discuss when and where these features have the greatest effect. Attend this course to deepen your understanding of vSphere and learn how its advanced features and controls can benefit your organization.

More Information

Administering Cisco Unified Communications Manager (ACUCM with AUC) is a 5-day training program that provides system administrators and networking professionals with an understanding of the Cisco Unified Communications Manager System. This course teaches the concepts of IP telephony based in system administration, including its function, features, and configuration. This UC training course focuses on Cisco Unified Communications Manager (CUCM) v12x. All labs are using CUCM v12x.

More Information

In this course, you will develop your understanding about agile business analysis and the role of the business analyst on an agile team. You will learn how business analysis on an agile project is ‘the same’ and ‘different’ than business analysis performed on waterfall projects. You will understand how the business analysis role changes on an agile team.

More Information

This 2-day virtual workshop puts the distributed agile team members through their paces, by showing them how to conduct the five (5) scrum ceremonies, while simulating key activities within a sprint, all while working remotely and using their own project (for private classes) as a case study for the exercises.

More Information

This course shows you the fundamentals of building IT infrastructure on the AWS platform. You learn how to optimize the AWS Cloud by understanding AWS services and how they fit into cloud-based solutions. You explore best practices and design patterns to help you architect optimal IT solutions on AWS, then build and explore a variety of infrastructures through guided, hands-on activity. You learn how to create fledgling architectures and build them into robust and adaptive solutions.

More Information

This fundamental-level, full-day course is intended for individuals who seek an overall understanding of the AWS Cloud, independent of specific technical roles. It provides a detailed overview of cloud concepts, AWS services, security, architecture, pricing, and support. It includes lab exercises reinforcing some of the core concepts of the lecture. This course also helps you prepare for the AWS Certified Cloud Practitioner exam.

More Information

In this course, you will learn how to use the AWS SDK for developing secure and scalable cloud applications. The course provides in-depth knowledge about how to interact with AWS using code and covers key concepts, best practices, and troubleshooting tips.

More Information

During this 5-day course, students will learn Transact-SQL as implemented in SQL Server 2008, 2012 and 2014. The course starts by establishing a foundation understanding of database concepts and terminology. Students are then prepared to use various Microsoft tools to submit queries and view the result.

More Information

The Implementing and Operating Cisco Enterprise Network Core Technologies (ENCOR) v1.3 training gives you the knowledge and skills needed to install, configure, operate, and troubleshoot an enterprise network and introduces you to overlay network design by using SD-Access and SD-WAN solutions. This course also prepares you for the 350-401 Implementing Cisco Enterprise Network Core Technologies (ENCOR) exam.

More Information

Explore Oracle 19c DB Architecture, DBCS Deployments, Managing Database Security, Backup and Recovery, Monitoring and more.

More Information

Explore Next-Level DBA Skills: Backup & Recovery, Working with Grid Infrastructure, Upgrading Oracle and more.

More Information

This class uses the standard VMware vSphere class outline as a base but adds additional focus on vSphere troubleshooting - giving students a more practical learning experience compared with the standard course introductory vSphere class.

More Information

This course is a continuation of AZ-040T00: Automating Administration with PowerShell, taking a deep dive into the development of PowerShell cmdlets and modules and features both hands on labs and challenging exercises to help you practice the skills presented in class. Time will be spent on the theory of how to design cmdlets as well as the proper structure of the programming code to facility cmdlets that work with the PowerShell piping in keeping with community standards.

More Information

This is a 4-day course is designed to provide you with the knowledge and skills required to support and troubleshoot Windows 11 PCs and devices in an on-premises Windows Server Active Directory domain environment.

More Information

In this course, students will continue their learning on the foundations of report writing with Microsoft® SQL Server® Report Builder and SSRS.

More Information

AWS Technical Essentials introduces you to AWS products, services, and common solutions. It provides you with fundamentals to become more proficient in identifying AWS services so that you can make informed decisions about IT solutions based on your business requirements and get started working on AWS.

More Information

In this course, you will learn to build streaming data analytics solutions using AWS services, including Amazon Kinesis and Amazon Managed Streaming for Apache Kafka (Amazon MSK).

More Information

Building on concepts introduced in Architecting on AWS, Advanced Architecting on AWS is intended for individuals who are experienced with designing scalable and elastic applications on the AWS platform. Building on concepts introduced in Architecting on AWS, this course covers how to build complex solutions which incorporate data services, governance, and security on AWS. This course introduces specialized AWS services, including AWS Direct Connect and AWS Storage Gateway to support Hybrid architecture.

More Information

This course is intended for power users and IT professionals who are tasked with working within the SharePoint 2016 environment and conduct site collection and site administration. This course is for an on-premise SharePoint environment.

More Information

The course focuses on common data engineering tasks such as orchestrating data transfer and transformation pipelines, working with data files in a data lake, creating and loading relational data warehouses, capturing and aggregating streams of real-time data, and tracking data assets and lineage.

More Information

This course introduces a process for effectively planning and designing a functional, efficient database. Knowing how to plan a relational database is important to the success of the databases you create. Without planning, you cannot possibly know what the database needs to do, or even what information to include in the database. Planning a database is essential and prevents the extra work of fixing data maintenance problems later on.

More Information

This five-day course describes how to set up, configure and manage an Office 365 tenant including identities, and the core services of Microsoft 365. In this five-day course, you will learn how to plan the configuration of an Office 365 tenant including integration with existing user identities; plan, configure and manage core services; and report on key metrics.

More Information

This five-day instructor-led course provides students who administer and maintain SQL Server databases with the knowledge and skills to administer a SQL server database infrastructure. Additionally, it will be of use to individuals who develop applications that deliver content from SQL Server databases.

More Information

In this course, you will analyze and visualize data using Excel and associated tools.

More Information

This 2-day instructor-led course teaches you everything you need to know about BitLocker. This course includes hands-on labs that reinforce and expand on the instructor-led portion by having you actually deploy and operate BitLocker. You’ll practice techniques for setting up a BitLocker-enabled environment, implementing BitLocker on multiple system configurations, and recovering BitLocker after the detection of a possible compromise.

More Information

Learn to efficiently manage enterprise devices using Microsoft Intune, including enrollment, application deployment, endpoint security, and Windows Autopilot, to enhance productivity and security.

More Information