Spark is a highly optimized Data Science environment running on Hadoop YARN, with support for Machine Learning through MLib and Mahout, SQL, DataFrames, and Streaming. In this course, Data Scientists dive into the details of practical data science on the Spark platform, including real-world interaction with other systems in modern Data Science environments.
Quick Start to Spark for R Experienced Data Scientists & Analysts is intended for existing Data Scientists already fluent in data science techniques in other languages such as SAS and already comfortable with R. This course will be presented in a "rolling lab" approach - a continuous workshop of real-world data exploration involving real-world problems. As such, problems and opportunities will be explored as data suggests and as questions arise. "Lecture" material will be provided only as is necessary to explain the background of the approach being used at the moment. Times and ordering of the material are highly flexible and should be used only as estimates. Student questions and requests will also significantly alter the direction of the workshop.
The objective of the course is to practically transition these data scientists to the R/Spark/Hadoop environment, becoming comfortable with the tools and machine learning libraries and conduct statistical and machine learning analyses they've already been performing in SAS or similar environments.
This course is approximately 50% hands-on, combining expert lecture, real-world demonstrations and group discussions with machine-based practical labs and exercises. The objective of the course is to practically transition these data scientists to the R/Spark/Hadoop environment, becoming comfortable with the tools and machine learning libraries and conduct statistical and machine learning analyses they've already been performing in SAS or similar environments.
This course is intended for existing Data Scientists already fluent in data science techniques in other languages such as SAS and already comfortable with R.
Please see the Related Courses tab for specific Pre-Requisite courses, Related Courses or Follow On training options. Our team will be happy to help you with recommendations for next steps in your Learning Journey.Please note that this list of topics is based on our standard course offering, evolved from typical industry uses and trends. We’ll work with you to tune this course and level of coverage to target the skills you need most.
Spark Overview
Spark Overview
DataFrames
Spark SQL
Spark MLib
Spark Streaming
Streaming with Kafka
Data Flow with NiFi
Cluster Mode
Spark - the Big Picture
Student Materials: Each participant will receive a Student Guide with course notes, code samples, software tutorials, step-by-step written lab instructions, diagrams and related reference materials and resource links. Students will also receive the project files (or code, if applicable) and solutions required for the hands-on work
Hands-On Setup Made Simple! Our dedicated tech team will work with you to ensure our ‘easy-access’ cloud-based course environment is accessible, fully-tested and verified as ready to go well in advance of the course start date, ensuring a smooth start to class and effective learning experience for all participants. Please inquire for details and options.
Live scheduled classes are listed below or browse our full course catalog anytime
Check out custom training solutions planned around your unique needs and skills.
Exclusive materials, ongoing support and a free live course refresh with every class.
Fresh Spring Savings!
Buy One Get One Free!
Enroll by May 31 in any TWO public classes in 2022 for the price of ONE!
Special Offers
Limited Offer for most courses.
SAVE 50%