Delta Lake with Apache Spark using Scala
Delta Lake with Apache Spark using Scala on Databricks platform
Development ,Database and Design Development,Apache Spark
Lectures -53
Resources -2
Duration -2 hours
Lifetime Access
Lifetime Access
30-days Money-Back Guarantee
Get your team access to 10000+ top Tutorials Point courses anytime, anywhere.
Course Description
You will Learn Delta Lake with Apache Spark using Scala on the DataBricks Platform
Learn the latest Big Data Technology - Spark! And learn to use it with one of the most popular programming languages, Scala!
One of the most valuable technology skills is the ability to analyze huge data sets, and this course is specifically designed to bring you up to speed on one of the best technologies for this task, Apache Spark! The top technology companies like Google, Facebook, Netflix, Airbnb, Amazon, NASA, and more are all using Spark to solve their big data problems!
Spark can perform up to 100x faster than Hadoop MapReduce, which has caused an explosion in demand for this skill! Because the Spark 3.0 DataFrame framework is so new, you now have the ability to quickly become one of the most knowledgeable people in the job market!
Delta Lake is an open-source storage layer that brings reliability to data lakes. Delta Lake provides ACID transactions, scalable metadata handling, and unifies streaming and batch data processing. Delta Lake runs on top of your existing data lake and is fully compatible with Apache Spark APIs.
Apache Spark is a fast and general-purpose cluster computing system. It provides high-level APIs in Java, Scala, Python and R, and an optimized engine that supports general execution graphs. It also supports a rich set of higher-level tools including Spark SQL for SQL and structured data processing, MLlib for machine learning, GraphX for graph processing, and Spark Streaming.
Topics Included in the Course:
Introduction to Delta Lake
Introduction to Data Lake
Key Features of Delta Lake
Introduction to Spark
Free Account creation in Databricks
Provisioning a Spark Cluster
Basics about notebooks
Dataframes
Create a table
Write a table
Read a table
Schema validation
Update table schema
Table Metadata
Delete from a table
Update a Table
Vacuum
History
Concurrency Control
Optimistic concurrency control
Migrate Workloads to Delta Lake
Optimize Performance with File Management
Auto Optimize
Optimize Performance with Caching
Delta and Apache Spark caching
Cache a subset of the data
Isolation Levels
Best Practices
Frequently Asked Questions in Interview
About Databricks:
Databricks lets you start writing Spark code instantly so you can focus on your data problems.
Goals
- You will be able to learn Delta Lake with Apache Spark in a few hours
- Basics to Advance Level of Knowledge about Delta Lake
- Hands-on practice with Delta Lake
- You will Learn Delta Lake with Apache Spark using Scala on the DataBricks Platform
- Learn how to leverage the power of Delta Lake with a Spark Environment!
- Learn about the DataBricks Platform!
Prerequisites
- Apache Spark and Scala and SQL basic knowledge is necessary for this course.
Curriculum
Check out the detailed breakdown of what’s inside the course
Introduction
52 Lectures
- Course Introduction 03:21 03:21
- Introduction to Delta Lake 01:30 01:30
- Introduction to Data Lake 01:09 01:09
- Key Features of Delta Lake 04:57 04:57
- Elements of Delta Lake 03:18 03:18
- Introduction to Spark 04:04 04:04
- (Old) Free Account creation in Databricks 01:51 01:51
- (New) Free Account creation in Databricks 01:50 01:50
- Provisioning a Spark Cluster 02:14 02:14
- Basics about notebooks 07:29 07:29
- Dataframes 04:47 04:47
- Download Code and Files
- (Hands On) Create a table 06:38 06:38
- (Hands On) Write a table 14:12 14:12
- (Hands On) Read a table 06:52 06:52
- Schema validation 02:49 02:49
- (Hands On) Update table schema 03:01 03:01
- Table Metadata 01:53 01:53
- Delete from a table 01:44 01:44
- Update a Table 02:10 02:10
- Vacuum 01:59 01:59
- History 01:34 01:34
- Concurrency Control 01:08 01:08
- Optimistic concurrency control 02:33 02:33
- Migrate Workloads to Delta Lake 05:23 05:23
- Optimize Performance with File Management 01:13 01:13
- Auto Optimize 02:45 02:45
- Optimize Performance with Caching 01:11 01:11
- Delta and Apache Spark caching 03:26 03:26
- Cache a subset of the data 01:37 01:37
- Isolation Levels 01:06 01:06
- Best Practices 02:56 02:56
- FAQ (Interview Question on Optimization) 1 01:47 01:47
- FAQ (Interview Question on Optimization) 2 01:50 01:50
- FAQ (Interview Question on Optimization) 3 00:51 00:51
- FAQ (Interview Question on Auto Optimize) 4 00:50 00:50
- FAQ (Interview Question on Auto Optimize) 5 01:06 01:06
- FAQ (Interview Question) 6 01:06 01:06
- FAQ (Interview Question) 7 00:37 00:37
- FAQ (Interview Question) 8 00:42 00:42
- FAQ (Interview Question) 9 00:20 00:20
- FAQ (Interview Question) 10 00:25 00:25
- FAQ (Interview Question) 11 00:28 00:28
- FAQ (Interview Question) 12 00:27 00:27
- FAQ (Interview Question) 13 00:43 00:43
- FAQ (Interview Question) 14 00:55 00:55
- FAQ (Interview Question) 15 01:39 01:39
- FAQ (Interview Question) 16 00:31 00:31
- FAQ (Interview Question) 17 00:32 00:32
- FAQ (Interview Question) 18 01:00 01:00
- FAQ (Interview Question) 19 01:25 01:25
- Thank you 00:20 00:20
Instructor Details
Bigdata Engineer
I am Solution Architect with 12+ year’s of experience in Banking, Telecommunication and Financial Services industry across a diverse range of roles in Credit Card, Payments, Data Warehouse and Data Center programmes
My role as Bigdata and Cloud Architect to work as part of Bigdata team to provide Software Solution.
Responsibilities includes,
- Support all Hadoop related issues
- Benchmark existing systems, Analyse existing system challenges/bottlenecks and Propose right solutions to eliminate them based on various Big Data technologies
- Analyse and Define pros and cons of various technologies and platforms
- Define use cases, solutions and recommendations
- Define Big Data strategy
- Perform detailed analysis of business problems and technical environments
- Define pragmatic Big Data solution based on customer requirements analysis
- Define pragmatic Big Data Cluster recommendations
- Educate customers on various Big Data technologies to help them understand pros and cons of Big Data
- Data Governance
- Build Tools to improve developer productivity and implement standard practices
I am sure the knowledge in these courses can give you extra power to win in life.
All the best!!
Course Certificate
Use your certificate to make a career change or to advance in your current career.
Our students work
with the Best
Related Video Courses
View MoreAnnual Membership
Become a valued member of Tutorials Point and enjoy unlimited access to our vast library of top-rated Video Courses
Subscribe nowOnline Certifications
Master prominent technologies at full length and become a valued certified professional.
Explore Now