Getting Started
- check java
- check python
- install pyspark
- install jupyter
java
To see if java is installed, otherwise install java
java -version
openjdk version "17.0.7" 2023-04-18 LTS
python
To see if python is installed, otherwise install python
python3 --version
Python 3.9.6
pyspark
https://spark.apache.org/docs/latest/api/python/getting_started/install.html#using-pypi
pip3 install pyspark
Run
pyspark
...
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 3.4.0
/_/
Using Python version 3.9.6 (default, Mar 10 2023 20:16:38)
...
jupyter
pip3 install jupyter
Run
jupyter notebook
- A browser will open with a notebook
- Create a New Notebook
import findspark
findspark.init()
import pyspark
import random
sc = pyspark.SparkContext(appName="Pi")
num_samples = 100000000
def inside(p):
x, y = random.random(), random.random()
return x*x + y*y < 1
count = sc.parallelize(range(0, num_samples)).filter(inside).count()
pi = 4 * count / num_samples
print(pi)
sc.stop()
It will print: 3.14159256 This the Monte Marlo method to calculate the value of PI
trouble shouting
If jupyter
does not work, maybe <python>/bin is not on the PATH
You maybe did get a warning after the installation of jupyter
You can add it to the path on Mac Terminal (example)
nano ~/.zshrc
export PATH=/Users/littleworld/Library/Python/3.9/bin:$PATH