PySpark实战:Linux搭建Spark环境

来自CloudWiki
Cloud17讨论 | 贡献2021年6月27日 (日) 08:45的版本 (创建页面,内容为“==准备工作== *PySpark实战:下载Spark ==实训步骤== ===解压软件包=== mkdir /root/wmtools cd wmtools tar -zxvf spark-2.4.8-bin-hadoop2.7.tgz -C…”)
(差异) ←上一版本 | 最后版本 (差异) | 下一版本→ (差异)
跳转至: 导航搜索

准备工作

实训步骤

解压软件包

mkdir /root/wmtools

cd wmtools

tar -zxvf spark-2.4.8-bin-hadoop2.7.tgz -C ~/wmtools

验证安装是否成功

cd /root/wmtools/spark-2.4.8-bin-hadoop2.7/bin

./spark-submit ../examples/src/main/python/pi.py

求pi的值,运行结果如图:

...
21/06/27 16:34:26 INFO DAGScheduler: Job 0 finished: reduce at /root/wmtools/spark-2.4.8-bin-hadoop2.7/bin/../examples/src/main/python/pi.py:44, took 1.080727 s
Pi is roughly 3.145780
21/06/27 16:34:26 INFO SparkUI: Stopped Spark web UI at http://10.0.0.30:4040
21/06/27 16:34:26 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
21/06/27 16:34:26 INFO MemoryStore: MemoryStore cleared
21/06/27 16:34:26 INFO BlockManager: BlockManager stopped
21/06/27 16:34:26 INFO BlockManagerMaster: BlockManagerMaster stopped
21/06/27 16:34:26 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
21/06/27 16:34:26 INFO SparkContext: Successfully stopped SparkContext
21/06/27 16:34:27 INFO ShutdownHookManager: Shutdown hook called
21/06/27 16:34:27 INFO ShutdownHookManager: Deleting directory /tmp/spark-12d3e207-636d-48df-ac07-1906b8ec0b83
21/06/27 16:34:27 INFO ShutdownHookManager: Deleting directory /tmp/spark-12d3e207-636d-48df-ac07-1906b8ec0b83/pyspark-52f38583-70e0-4417-b37e-0880b1f319c3
21/06/27 16:34:27 INFO ShutdownHookManager: Deleting directory /tmp/spark-3caa03ed-ee35-4b64-9827-493bd631a7a7

修改配置文件

此处默认的输出信息太多,这里修改一下日志文件

在目录conf下,重命名log4j.properties.template 为log4j.properties

[root@localhost conf]# pwd
/root/wmtools/spark-2.4.8-bin-hadoop2.7/conf
[root@localhost conf]# ls
docker.properties.template  metrics.properties.template   spark-env.sh.template
fairscheduler.xml.template  slaves.template
log4j.properties.template   spark-defaults.conf.template
[root@localhost conf]# mv log4j.properties.template log4j.properties

将log4j.properties文件中的

log4j.rootCategory=INFO, console

改为

log4j.rootCategory=ERROR, console

再次执行如下命令,则输出信息非常少:

21/06/27 16:44:14 WARN Utils: Your hostname, localhost.localdomain resolves to a loopback address: 127.0.0.1; using 10.0.0.30 instead (on interface ens33)
21/06/27 16:44:14 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
21/06/27 16:44:14 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Pi is roughly 3.131460