This self-paced IBM course will teach you all about big data! You will become familiar with the characteristics of big data and its application in big data analytics. You will also gain hands-on experience with big data processing tools like Apache Hadoop and Apache Spark.



Introduction to Big Data with Spark and Hadoop
This course is part of multiple programs.



Instructors: Aije Egwaikhide
62,885 already enrolled
Included with
(433 reviews)
Recommended experience
What you'll learn
Explain the impact of big data, including use cases, tools, and processing methods.
Describe Apache Hadoop architecture, ecosystem, practices, and user-related applications, including Hive, HDFS, HBase, Spark, and MapReduce.
Apply Spark programming basics, including parallel programming basics for DataFrames, data sets, and Spark SQL.
Use Spark鈥檚 RDDs and data sets, optimize Spark SQL using Catalyst and Tungsten, and use Spark鈥檚 development and runtime environment options.
Details to know

Add to your LinkedIn profile
14 assignments
See how employees at top companies are mastering in-demand skills

Build your subject-matter expertise
- Learn new concepts from industry experts
- Gain a foundational understanding of a subject or tool
- Develop job-relevant skills with hands-on projects
- Earn a shareable career certificate


Earn a career certificate
Add this credential to your LinkedIn profile, resume, or CV
Share it on social media and in your performance review

There are 7 modules in this course
In this module, you鈥檒l begin your acquisition of Big Data knowledge with the most up-to-date definition of Big Data. You鈥檒l explore the impact of Big Data on everyday personal tasks and business transactions with Big Data Use Cases. You鈥檒l also learn how Big Data uses parallel processing, scaling, and data parallelism. Going further, you鈥檒l explore commonly used Big Data tools and explain the role of open-source in Big Data. Finally, you鈥檒l go beyond the hype and explore additional Big Data viewpoints.
What's included
8 videos1 reading2 assignments2 plugins
In this module, you'll gain a fundamental understanding of the Apache Hadoop architecture, ecosystem, practices, and commonly used applications, including Distributed File System (HDFS), MapReduce, Hive, and HBase. You鈥檒l also gain practical skills in hands-on labs when you query the data added using Hive, launch a single-node Hadoop cluster using Docker, and run MapReduce jobs.
What's included
6 videos1 reading2 assignments3 app items2 plugins
In this module, you鈥檒l turn your attention to the popular Apache Spark platform, where you will explore the attributes and benefits of Apache Spark and distributed computing. You'll gain key insights about functional programming and Lambda functions. You鈥檒l also explore Resilient Distributed Datasets (RDDs), parallel programming, resilience in Apache Spark, and relate RDDs and parallel programming with Apache Spark. Then, you鈥檒l dive into additional Apache Spark components and learn how Apache Spark scales with Big Data. Working with Big Data signals the need for working with queries, including structured queries using SQL. You鈥檒l also learn about the functions, parts, and benefits of Spark SQL and DataFrame queries, and discover how DataFrames work with Spark SQL.
What's included
5 videos1 reading2 assignments2 app items2 plugins
In this module, you鈥檒l learn about Resilient Distributed Datasets (RDDs), their uses in Apache Spark, and RDD transformations and actions. You'll compare the use of datasets with Spark's latest data abstraction, DataFrames. You'll learn to identify and apply basic DataFrame operations. You鈥檒l explore Apache Spark SQL optimization and learn how Spark SQL and memory optimization benefit from using Catalyst and Tungsten. Finally, you鈥檒l fortify your skills with guided hands-on lab to create a table view and apply data aggregation techniques.
What's included
5 videos1 reading2 assignments2 app items4 plugins
In this module, you鈥檒l explore how Spark processes the requests that your application submits and learn how you can track work using the Spark Application UI. Because Spark application work happens on the cluster, you need to be able to identify Apache Cluster Managers, their components, and benefits. You鈥檒l also know how to connect with each cluster manager and how and when you might want to set up a local, standalone Spark instance. Next, you鈥檒l learn about Apache Spark application submission, including the use of Spark鈥檚 unified interface, 鈥渟park-submit,鈥 and learn about options and dependencies. You鈥檒l also describe and apply options for submitting applications, identify external application dependency management techniques, and list Spark Shell benefits. You鈥檒l also look at recommended practices for Spark's static and dynamic configuration options and perform hands-on labs to use Apache Spark on IBM Cloud and run Spark on Kubernetes.
What's included
6 videos2 readings3 assignments2 app items4 plugins
Platforms and applications require monitoring and tuning to manage issues that inevitably happen. In this module, you'll learn about connecting the Apache Spark user interface web server and using the same UI web server to manage application processes. You鈥檒l also identify common Apache Spark application issues and learn about debugging issues using the application UI and locating related log files. Further, you鈥檒l discover and gain real-world knowledge about how Spark manages memory and processor resources using the hands-on lab.
What's included
5 videos1 reading2 assignments1 app item3 plugins
In this module, you鈥檒l perform a practice lab where you鈥檒l explore two critical aspects of data processing using Spark: working with Resilient Distributed Datasets (RDDs) and constructing DataFrames from JSON data. You will also apply various transformations and actions on both RDDs and DataFrames to gain insights and manipulate the data effectively. Further, you鈥檒l apply your knowledge in a final project where you will create a DataFrame by loading data from a CSV file and applying transformations and actions using Spark SQL. Finally, you鈥檒l be assessed based on your learning from the course.
What's included
3 readings1 assignment2 app items2 plugins
Instructors



Offered by
Explore more from Data Management
Johns Hopkins University
Johns Hopkins University
Why people choose 糖心vlog官网观看 for their career




Learner reviews
433 reviews
- 5 stars
65.35%
- 4 stars
19.63%
- 3 stars
8.08%
- 2 stars
3.23%
- 1 star
3.69%
Showing 3 of 433
Reviewed on Oct 28, 2022
well-structured course with comprehensive content and practical skills
Reviewed on May 2, 2022
hands on lab and quizzes at the end of each session was very helpful
Reviewed on Jan 16, 2024
Great program to explore more about AI and Big Data
New to Data Management? Start here.

Open new doors with 糖心vlog官网观看 Plus
Unlimited access to 10,000+ world-class courses, hands-on projects, and job-ready certificate programs - all included in your subscription
Advance your career with an online degree
Earn a degree from world-class universities - 100% online
Join over 3,400 global companies that choose 糖心vlog官网观看 for Business
Upskill your employees to excel in the digital economy
Frequently asked questions
Access to lectures and assignments depends on your type of enrollment. If you take a course in audit mode, you will be able to see most course materials for free. To access graded assignments and to earn a Certificate, you will need to purchase the Certificate experience, during or after your audit. If you don't see the audit option:
The course may not offer an audit option. You can try a Free Trial instead, or apply for Financial Aid.
The course may offer 'Full Course, No Certificate' instead. This option lets you see all course materials, submit required assessments, and get a final grade. This also means that you will not be able to purchase a Certificate experience.
When you enroll in the course, you get access to all of the courses in the Certificate, and you earn a certificate when you complete the work. Your electronic Certificate will be added to your Accomplishments page - from there, you can print your Certificate or add it to your LinkedIn profile. If you only want to read and view the course content, you can audit the course for free.
If you subscribed, you get a 7-day free trial during which you can cancel at no penalty. After that, we don鈥檛 give refunds, but you can cancel your subscription at any time. .
More questions
Financial aid available,