“PySpark实战:Linux搭建Spark环境”的版本间的差异
来自CloudWiki
(创建页面,内容为“==准备工作== *PySpark实战:下载Spark ==实训步骤== ===解压软件包=== mkdir /root/wmtools cd wmtools tar -zxvf spark-2.4.8-bin-hadoop2.7.tgz -C…”) |
(没有差异)
|
2021年6月27日 (日) 08:45的版本
准备工作
实训步骤
解压软件包
mkdir /root/wmtools
cd wmtools
tar -zxvf spark-2.4.8-bin-hadoop2.7.tgz -C ~/wmtools
验证安装是否成功
cd /root/wmtools/spark-2.4.8-bin-hadoop2.7/bin
./spark-submit ../examples/src/main/python/pi.py
求pi的值,运行结果如图:
... 21/06/27 16:34:26 INFO DAGScheduler: Job 0 finished: reduce at /root/wmtools/spark-2.4.8-bin-hadoop2.7/bin/../examples/src/main/python/pi.py:44, took 1.080727 s Pi is roughly 3.145780 21/06/27 16:34:26 INFO SparkUI: Stopped Spark web UI at http://10.0.0.30:4040 21/06/27 16:34:26 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 21/06/27 16:34:26 INFO MemoryStore: MemoryStore cleared 21/06/27 16:34:26 INFO BlockManager: BlockManager stopped 21/06/27 16:34:26 INFO BlockManagerMaster: BlockManagerMaster stopped 21/06/27 16:34:26 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 21/06/27 16:34:26 INFO SparkContext: Successfully stopped SparkContext 21/06/27 16:34:27 INFO ShutdownHookManager: Shutdown hook called 21/06/27 16:34:27 INFO ShutdownHookManager: Deleting directory /tmp/spark-12d3e207-636d-48df-ac07-1906b8ec0b83 21/06/27 16:34:27 INFO ShutdownHookManager: Deleting directory /tmp/spark-12d3e207-636d-48df-ac07-1906b8ec0b83/pyspark-52f38583-70e0-4417-b37e-0880b1f319c3 21/06/27 16:34:27 INFO ShutdownHookManager: Deleting directory /tmp/spark-3caa03ed-ee35-4b64-9827-493bd631a7a7
修改配置文件
此处默认的输出信息太多,这里修改一下日志文件
在目录conf下,重命名log4j.properties.template 为log4j.properties
[root@localhost conf]# pwd /root/wmtools/spark-2.4.8-bin-hadoop2.7/conf [root@localhost conf]# ls docker.properties.template metrics.properties.template spark-env.sh.template fairscheduler.xml.template slaves.template log4j.properties.template spark-defaults.conf.template [root@localhost conf]# mv log4j.properties.template log4j.properties
将log4j.properties文件中的
log4j.rootCategory=INFO, console
改为
log4j.rootCategory=ERROR, console
再次执行如下命令,则输出信息非常少:
21/06/27 16:44:14 WARN Utils: Your hostname, localhost.localdomain resolves to a loopback address: 127.0.0.1; using 10.0.0.30 instead (on interface ens33) 21/06/27 16:44:14 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address 21/06/27 16:44:14 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Pi is roughly 3.131460