查看“PySpark实战:Linux搭建Spark环境”的源代码
←
PySpark实战:Linux搭建Spark环境
跳转至:
导航
,
搜索
因为以下原因,您没有权限编辑本页:
您所请求的操作仅限于该用户组的用户使用:
用户
您可以查看与复制此页面的源代码。
==准备工作== *[[PySpark实战:下载Spark]] ==安装运行Spark== ===解压软件包=== mkdir /root/wmtools cd wmtools tar -zxvf spark-2.4.8-bin-hadoop2.7.tgz -C ~/wmtools ===验证安装是否成功=== cd /root/wmtools/spark-2.4.8-bin-hadoop2.7/bin ./spark-submit ../examples/src/main/python/pi.py 求pi的值,运行结果如图: <nowiki> ... 21/06/27 16:34:26 INFO DAGScheduler: Job 0 finished: reduce at /root/wmtools/spark-2.4.8-bin-hadoop2.7/bin/../examples/src/main/python/pi.py:44, took 1.080727 s Pi is roughly 3.145780 21/06/27 16:34:26 INFO SparkUI: Stopped Spark web UI at http://10.0.0.30:4040 21/06/27 16:34:26 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 21/06/27 16:34:26 INFO MemoryStore: MemoryStore cleared 21/06/27 16:34:26 INFO BlockManager: BlockManager stopped 21/06/27 16:34:26 INFO BlockManagerMaster: BlockManagerMaster stopped 21/06/27 16:34:26 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 21/06/27 16:34:26 INFO SparkContext: Successfully stopped SparkContext 21/06/27 16:34:27 INFO ShutdownHookManager: Shutdown hook called 21/06/27 16:34:27 INFO ShutdownHookManager: Deleting directory /tmp/spark-12d3e207-636d-48df-ac07-1906b8ec0b83 21/06/27 16:34:27 INFO ShutdownHookManager: Deleting directory /tmp/spark-12d3e207-636d-48df-ac07-1906b8ec0b83/pyspark-52f38583-70e0-4417-b37e-0880b1f319c3 21/06/27 16:34:27 INFO ShutdownHookManager: Deleting directory /tmp/spark-3caa03ed-ee35-4b64-9827-493bd631a7a7 </nowiki> ===修改配置文件=== 此处默认的输出信息太多,这里修改一下日志文件 在目录conf下,重命名log4j.properties.template 为log4j.properties <nowiki> [root@localhost conf]# pwd /root/wmtools/spark-2.4.8-bin-hadoop2.7/conf [root@localhost conf]# ls docker.properties.template metrics.properties.template spark-env.sh.template fairscheduler.xml.template slaves.template log4j.properties.template spark-defaults.conf.template [root@localhost conf]# mv log4j.properties.template log4j.properties </nowiki> 将log4j.properties文件中的 log4j.rootCategory=INFO, console 改为 log4j.rootCategory=ERROR, console 再次执行如下命令,则输出信息非常少: <nowiki> 21/06/27 16:44:14 WARN Utils: Your hostname, localhost.localdomain resolves to a loopback address: 127.0.0.1; using 10.0.0.30 instead (on interface ens33) 21/06/27 16:44:14 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address 21/06/27 16:44:14 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Pi is roughly 3.131460 </nowiki> 注:WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 这是由于未安装和配置Hadoop导致的。一般来说,并不影响Spark的使用。 ==安装运行PySpark== ===运行pyspark=== cd /root/wmtools/spark-2.4.8-bin-hadoop2.7/bin ./pyspark <nowiki> Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /__ / .__/\_,_/_/ /_/\_\ version 2.4.8 /_/ Using Python version 2.7.5 (default, Aug 7 2019 00:51:29) SparkSession available as 'spark'. >>> </nowiki> 退出: >>> exit() ===修改pyspark运行环境=== 虽然安装了python3.7 ,但是spark启动后加载的还是python 2.7.5 为了切换python的运行任务至3.7,需要进行一些配置: cd /root/wmtools/spark-2.4.8-bin-hadoop2.7/bin vi pyspark ,添加: export PYSPARK_PYTHON=/usr/local/Python3/bin/python3 下面再次运行pyspark: cd /root/wmtools/spark-2.4.8-bin-hadoop2.7/bin ./pyspark <nowiki> Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /__ / .__/\_,_/_/ /_/\_\ version 2.4.8 /_/ Using Python version 3.7.5 (default, May 25 2021 14:04:16) SparkSession available as 'spark'. >>> </nowiki> 至此,一个具备Python3.x的Spark环境搭建完成。 注意:Spark2.4.8与Python3.8还不太兼容,因此目前不建议安装Python3.8
返回至
PySpark实战:Linux搭建Spark环境
。
导航菜单
个人工具
登录
命名空间
页面
讨论
变种
视图
阅读
查看源代码
查看历史
更多
搜索
导航
首页
最近更改
随机页面
帮助
工具
链入页面
相关更改
特殊页面
页面信息