Apache Spark, a significant component in the Hadoop Ecosystem, is a cluster computing engine used in Big Data. Building on top of the Hadoop YARN and HDFS ecosystem, it offers order-of-magnitude faster processing for many in-memory computing tasks compared to Map/Reduce. It can be programmed in Java, Scala, Python, and R - the favorite languages of Data Scientists - along with SQL-based front ends. With advanced libraries like Mahout and MLib for Machine Learning, GraphX or Neo4J for rich data graph processing as well as access to other NOSQL data stores, Rule engines and other Enterprise components, Spark is a lynchpin in modern Big Data and Data Science computing.
Geared for experienced developers, Developing with Spark for Big Data is an intermediate-level and beyond course that provides students with a comprehensive, hands-on exploration of enterprise-grade Spark programming, interacting with the significant components mentioned above to craft complete data science solutions. Students will leave this course armed with the skills they require to work with Spark in a practical, real world environment to an advanced level.
NOTE: Students newer to data science or with lighter development background should consider the TTSK7503 Spark Developer | Introduction to Spark for Big Data, Hadoop & Machine Learning, our three-day subset of this course, as an alternative.
This course is offered in support of the Java programming language, with alternatives available in R Programming, Python and Scala. Our team will work with you to coordinate the languages, tools and environment that will work best for your organization and needs.
This course provides indoctrination in the practical use of the umbrella of technologies that are on the leading edge of data science development focused on Spark and related tools. Working in a hands-on learning environment, students will learn:
Need different skills or topics? If your team requires different topics or tools, additional skills or custom approach, this course may be easily adjusted to accommodate. We offer additional related Spark, Hadoop, data science, programming and development courses which may be blended with this course for a track that best suits your development objectives. Our team will collaborate with you to understand your needs and will target the course to focus on your specific learning objectives and goals.
Take Before: Students should have attended the course(s) below, or should have basic skills in these areas:
Please see the Related Courses tab for specific Pre-Requisite courses, Related Courses that offer similar skills or topics, and next-step Learning Path recommendations.
Please note that this list of topics is based on our standard course offering, evolved from typical industry uses and trends. We’ll work with you to tune this course and level of coverage to target the skills you need most.
Spark Overview
Spark Component Overview
RDDs: Resilient Distributed Datasets
DataFrames
Spark Applications
DataFrame Persistence
Distributed Persistence
Spark Streaming
Accessing NOSQL Data
Enterprise Integration
Algorithms and Patterns
Spark SQL
GraphX
Alternate Languages
Clustering Spark for Developers
Performance and Tuning
Student Materials: Each participant will receive a Student Guide with course notes, code samples, software tutorials, step-by-step written lab instructions, diagrams and related reference materials and resource links. Students will also receive the project files (or code, if applicable) and solutions required for the hands-on work.
Hands-On Setup Made Simple! Our dedicated tech team will work with you to ensure our ‘easy-access’ cloud-based course environment is accessible, fully-tested and verified as ready to go well in advance of the course start date, ensuring a smooth start to class and effective learning experience for all participants. Please inquire for details and options.
Live scheduled classes are listed below or browse our full course catalog anytime
Check out custom training solutions planned around your unique needs and skills.
Exclusive materials, ongoing support and a free live course refresh with every class.
Mix, Match & Master!
2FOR1: Two Courses, One Price!
Enroll in *any* two public courses (for 2023 *OR* 2024 dates!) by October 31, for one price! Learn something new, or share the promo!
Special Offers
Limited Offer for most courses.
SAVE 50%