PySpark实战:DataFrame存储csv数据
来自CloudWiki
介绍
DataFrame存储csv数据
user.json
{"deptId":"01","name":"张三","gender":"男","age":32,"salary":5000}, {"deptId":"01","name":"李四","gender":"男","age":33,"salary":6000}, {"deptId":"01","name":"王五","gender":"女","age":38,"salary":5500}, {"deptId":"02","name":"Jack","gender":"男","age":42,"salary":7000}, {"deptId":"02","name":"Smith","gender":"女","age":27,"salary":6500}, {"deptId":"02","name":"Lily","gender":"女","age":45,"salary":9500}
代码
import findspark findspark.init() ############################################## from pyspark.sql import SparkSession spark = SparkSession.builder \ .master("local[1]") \ .appName("RDD Demo") \ .getOrCreate(); sc = spark.sparkContext ############################################# #注意:json数组不能有[] df = spark.read.format('json') \ .load('user.json') #mode : overwrite | append ... #user.csv不是一个文件,而是一个文件夹 df.write.csv("user.csv","append") spark.read.csv("user.csv").show() ############################################## """ +---+---+---+-----+----+ |_c0|_c1|_c2| _c3| _c4| +---+---+---+-----+----+ | 32| 01| 男| 张三|5000| | 33| 01| 男| 李四|6000| | 38| 01| 女| 王五|5500| | 42| 02| 男| Jack|7000| | 27| 02| 女|Smith|6500| | 45| 02| 女| Lily|9500| +---+---+---+-----+----+ """
输出
+---+---+---+-----+----+ |_c0|_c1|_c2| _c3| _c4| +---+---+---+-----+----+ | 32| 01| 男| 张三|5000| | 33| 01| 男| 李四|6000| | 38| 01| 女| 王五|5500| | 42| 02| 男| Jack|7000| | 27| 02| 女|Smith|6500| | 45| 02| 女| Lily|9500| +---+---+---+-----+----+