databricks

Databricks
COMMUNITY EDITION
First Steps with Apache Spark 3.1

Databricks

Return to Databricks Cloud Shell and Apache Spark

Snippet from Wikipedia: Databricks: Databricks, Inc. is a global data, analytics, and artificial intelligence (AI) company founded by the original creators of Apache Spark.
The company provides a cloud-based platform to help enterprises build, scale, and govern data and AI, including generative AI and other machine learning models.
Databricks pioneered the data lakehouse, a data and AI platform that combines the capabilities of a data warehouse with a data lake, allowing organizations to manage and use both structured and unstructured data for traditional business analytics and AI workloads.
In November 2023, Databricks unveiled the Databricks Data Intelligence Platform, a new offering that combines the unification benefits of the lakehouse with MosaicML’s Generative AI technology to enable customers to better understand and use their own proprietary data.
The company develops Delta Lake, an open-source project to bring reliability to data lakes for machine learning and other data science use cases.

Creative Commons Attribution-Share Alike 4.0

AN OPEN AND UNIFIED DATA ANALYTICS PLATFORM FOR DATA ENGINEERING, MACHINE LEARNING, AND ANALYTICS

From the original creators of Apache Spark, Delta Lake, MLflow, and Koalas

Select a platform:

DATABRICKS PLATFORM - FREE TRIAL

For businesses

Collaborative environment for Data teams to build solutions together
Unlimited clusters that can scale to any size, processing data in your own account
Job scheduler to execute jobs for production pipelines
Fully collaborative notebooks with multi-language support, dashboards, REST APIs
Native integration with the most popular ML frameworks (scikit-learn, TensorFlow, Keras,…), Apache SparkTM, Delta Lake, and MLflow
Advanced security, role-based access controls, and audit logs
Single Sign On support
Integration with BI tools such as Tableau, Qlik, and Looker
14-day full feature trial (excludes cloud charges)

CHOOSE YOUR CLOUD

Please note that Azure Databricks is provided by Microsoft Azure and is subject to Microsoft's terms. By clicking on the “AWS” button to get started, you agree to the Databricks Terms of Service. By clicking on the “Google Cloud” button to get started, you agree to the Databricks Terms of Service.

COMMUNITY EDITION

For students and educational institutions

Single Spark cluster limited to 15GB and no worker nodes
Basic notebooks without collaboration
Limited to 3 max users
Public environment to share your work

By clicking “Get Started” for the Community Edition, you agree to the Databricks Community Edition Terms of Service.

https://databricks.com/try-platform

Welcome to Databricks Community Edition!

Databricks Community Edition provides you with access to a free Spark micro-cluster as well as a cluster manager and a notebook environment - ideal for developers, data scientists, data engineers and other IT professionals to get started with Spark.

We need you to verify your email address by clicking on this link. You will then be redirected to Databricks Community Edition!

Get started by visiting: https://community.cloud.databricks.com/login.html

If you have any questions, please contact feedback@databricks.com.

- The Databricks Team

Instructor: Adam Breindel

LinkedIn: https://www.linkedin.com/in/adbreind

Email: adbreind@gmail.com

Twitter: @adbreind - 20+ years building systems for startups and large enterprises - 10+ years teaching data, ML, front- and back-end technology - Fun large-scale data projects… - Streaming neural net + decision tree fraud scoring - Realtime & offline analytics for banking - Music synchronization and licensing for networked jukeboxes - Industries - Finance, Insurance - Travel, Media / Entertainment - Energy, Government

Create a Databricks account • Sign up for free Community Edition now at https://databricks.com/try-databricks • Use Firefox, Chrome or Safari

Getting Started

These steps are illustrated on subsequent pages; this is the summary: 1. Copy the courseware link or prepare to type it ☺ https://materials.s3.amazonaws.com/2021/oreilly0708/spark.dbc

2. Import that file into your Databricks account per the instructions on the following slides.

3. Create a cluster: choose Databricks Runtime 8.2 (illustrated in the following slides) Setup with Databricks

3 Log in to Databricks

4 Import Notebooks… 1 2 3 5 Import Notebooks for Today… Type or paste in today’s notebook URL … then click Import Choose URL… https://materials.s3.amazonaws.com/2021/oreilly0708/spark.dbc 6 Find your notebook(s) here… Click Workspace Here are your notebooks 7 Create a Cluster 1 2 8 Create a Cluster 3 4 5 Runtime 8.2 (Scala 2.12, Spark 3.1.1) 9 All set: let's go!

https://learning.oreilly.com/live-events/spark-31-first-steps/0636920371533/0636920061945/

https://on24static.akamaized.net/event/33/46/13/4/rt/1/documents/resourceList1631717069306/setup.pdf

Create Cluster New Cluster 0 Workers:0 GB Memory, 0 Cores, 0 DBU 1 Driver:15.3 GB Memory, 2 Cores, 1 DBU UI|JSON Cluster Name Buddha Databricks Runtime Version Runtime: 8.3 (Scala 2.12, Spark 3.1.1) NoteDatabricks Runtime 8.x and later use Delta Lake as the default table format. Learn more Instance Free 15 GB Memory: As a Community Edition user, your cluster will automatically terminate after an idle period of two hours. For more configuration options, please upgrade your Databricks subscription. Instances Spark

Import Notebooks

Import from: File URL

https://materials.s3.amazonaws.com/2021/oreilly0708/spark.dbc

Accepted formats: .dbc, .scala, .py, .sql, .r, .ipynb, .Rmd, .html, .zip

This notebook is not attached to a cluster. Would you like to launch a new Spark cluster to continue working?

Automatically launch and attach to clusters without prompting