Apache Spark is a powerful, open-source processing engine for data in the Hadoop cluster, optimized for speed, ease of use, and sophisticated analytics. The Spark framework supports streaming data processing and complex, iterative algorithms, enabling applications to run up to 100x faster than traditional Hadoop MapReduce programs. With Spark, you can write sophisticated parallel applications to execute faster decisions, better decisions, and real-time actions, applied to a wide variety of use cases, architectures, and industries.
Apache Spark for Data Science is a three-day, hands-on course geared for technical business professional who wish to solve real-world data related problems using Apache Spark. This course explores using Apache Spark for common data related activities. Students will learn to build unified big data applications combining batch, streaming, and interactive analytics on all their data.
NOTE: The hands-on treatment and focus in this course is geared towards the data science aspects of Spark and related tools. Students who want a more developer-oriented edition of this course should consider the TTSK7503 Spark Developer | Spark for Big Data, Hadoop & Machine Learning which aligns in subject coverage but is geared for developers instead of data scientists.
This course is approximately 50% hands-on, combining expert lecture, real-world demonstrations and group discussions with machine-based practical labs and exercises. Working in a hands-on learning environment led by our expert practitioner students will explore:
Need different skills or topics? If your team requires different topics or tools, additional skills or custom approach, this course may be easily adjusted to accommodate. We offer additional related Spark, Hadoop, data science, programming and development courses which may be blended with this course for a track that best suits your development objectives. Our team will collaborate with you to understand your needs and will target the course to focus on your specific learning objectives and goals.
This course is an Introductory level and beyond course. Typical attendees would include systems administrators, testers or technical data related roles who need to learn to use Spark for data analysis or processing data.
Attending students should have the following background:
Please see the Related Courses tab for specific Pre-Requisite courses, Related Courses that offer similar skills or topics, and next-step Learning Path recommendations.
Please note that this list of topics is based on our standard course offering, evolved from typical industry uses and trends. We’ll work with you to tune this course and level of coverage to target the skills you need most.
Getting Started
Spark Overview
Spark Essentials
DataFrames
Spark SQL
Spark MLib
Spark Streaming
Streaming with Kafka
Data Flow with NiFi
Spark GraphX
Performance and Tuning
Cluster Mode
Spark - the Big Picture
Our course materials include more than a simple slideshow presentation handout. Each student will receive a comprehensive course Student Guide, complete with detailed course notes, code samples, software tutorials, diagrams and related reference materials and links. Our courses also include detailed our Student Workbook, with step by step hands-on lab instructions and project files (as necessary) and solutions, clearly illustrated for users to complete hands-on work in class, and to revisit to review or refresh skills at any time. Students will also receive the course set up files, project files (or code, if applicable) and solutions required for the hands-on work.
Live scheduled classes are listed below or browse our full course catalog anytime
Check out custom training solutions planned around your unique needs and skills.
Exclusive materials, ongoing support and a free live course refresh with every class.
Please see the current upcoming available open enrollment course dates posted below. Please feel free to Register Online below, or call 844-475-4559 toll free to connect with our Registrar for assistance. If you need additional date options, please contact us for scheduling.
Course Title | Days | Date | Time | Price | |
---|---|---|---|---|---|
Introduction to Apache Spark for Data Science | Analyzing Big Data with Spark | 3 Days | Apr 12 to Apr 14 | 10:00 AM to 06:00 PM EST | $2,395.00 | Register |
Introduction to Apache Spark for Data Science | Analyzing Big Data with Spark | 3 Days | May 24 to May 26 | 10:00 AM to 06:00 PM EST | $2,395.00 | Register |
Introduction to Apache Spark for Data Science | Analyzing Big Data with Spark | 3 Days | Jul 19 to Jul 21 | 10:00 AM to 06:00 PM EST | $2,395.00 | Register |
Introduction to Apache Spark for Data Science | Analyzing Big Data with Spark | 3 Days | Aug 30 to Sep 1 | 10:00 AM to 06:00 PM EST | $2,395.00 | Register |
Introduction to Apache Spark for Data Science | Analyzing Big Data with Spark | 3 Days | Oct 4 to Oct 6 | 10:00 AM to 06:00 PM EST | $2,395.00 | Register |
New Site, BIG Savings!
We're celebrating the launch of our lonnngggg awaited new site with with *50% off all 2021 Public Classes* booked by March 31! Check out our Current Offers for Individuals, Teams and Organizations to Learn for Less!
Special Offers
Limited Offer for most courses.
SAVE 50%