“PySpark实战:Jupyter Notebook安装”的版本间的差异

来自CloudWiki
跳转至: 导航搜索
(创建页面,内容为“==安装jupyter== pip3 install jupyter -i https://pypi.mirrors.ustc.edu.cn/simple/ [root@localhost ~]# find / -name \jupyter <nowiki>/usr/local/Python3/bin/jupyte…”)
 
第12行: 第12行:
 
./jupyter notebook --allow-root
 
./jupyter notebook --allow-root
  
==安装findspark==
+
===安装findspark===
 
pip3 install findspark
 
pip3 install findspark
 +
 +
===设置环境变量===
 +
vi /etc/profile
 +
 +
export SPARK_HOME=/root/wmtools/spark-2.4.8-bin-hadoop2.7
 +
 +
source /etc/profile
 +
 +
===运行Spark代码===
 +
python3 demo20.py 
 +
 +
<nowiki>#pip install findspark
 +
#fix:ModuleNotFoundError: No module named 'pyspark'
 +
import findspark
 +
findspark.init()
 +
 +
#############################
 +
from pyspark import SparkConf, SparkContext
 +
 +
# 创建SparkContext
 +
conf = SparkConf().setAppName("WordCount").setMaster("local[*]")
 +
sc = SparkContext(conf=conf)
 +
 +
rdd = sc.parallelize(["hello world","hello spark"]);
 +
rdd2 = rdd.flatMap(lambda line:line.split(" "));
 +
rdd3 = rdd2.map(lambda word:(word,1));
 +
rdd5 = rdd3.reduceByKey(lambda a, b : a + b);
 +
#print,否则无法显示结果
 +
#[('spark', 1), ('hello', 2), ('world', 1)]
 +
print(rdd5.collect());
 +
#防止多次创建SparkContexts
 +
sc.stop()
 +
</nowiki>

2021年6月30日 (三) 03:24的版本

安装jupyter

pip3 install jupyter -i https://pypi.mirrors.ustc.edu.cn/simple/

[root@localhost ~]# find / -name \jupyter

/usr/local/Python3/bin/jupyter
/usr/local/Python3/share/jupyter
/usr/local/Python3/etc/jupyter

cd /usr/local/Python3/bin

./jupyter notebook --allow-root

安装findspark

pip3 install findspark

设置环境变量

vi /etc/profile

export SPARK_HOME=/root/wmtools/spark-2.4.8-bin-hadoop2.7

source /etc/profile

运行Spark代码

python3 demo20.py

#pip install findspark
#fix:ModuleNotFoundError: No module named 'pyspark'
import findspark
findspark.init()

#############################
from pyspark import SparkConf, SparkContext

# 创建SparkContext
conf = SparkConf().setAppName("WordCount").setMaster("local[*]")
sc = SparkContext(conf=conf)
 
rdd = sc.parallelize(["hello world","hello spark"]);
rdd2 = rdd.flatMap(lambda line:line.split(" "));
rdd3 = rdd2.map(lambda word:(word,1));
rdd5 = rdd3.reduceByKey(lambda a, b : a + b);
#print,否则无法显示结果
#[('spark', 1), ('hello', 2), ('world', 1)]
print(rdd5.collect());
#防止多次创建SparkContexts
sc.stop()