PySpark实战:DataFrame条件选择
来自CloudWiki
(重定向自PySpark实战:DataFrame数据透视)
介绍
Spark读取JSON数据
user.json
{"deptId":"01","name":"张三","gender":"男","age":32,"salary":5000}, {"deptId":"01","name":"李四","gender":"男","age":33,"salary":6000}, {"deptId":"01","name":"王五","gender":"女","age":38,"salary":5500}, {"deptId":"02","name":"Jack","gender":"男","age":42,"salary":7000}, {"deptId":"02","name":"Smith","gender":"女","age":27,"salary":6500}, {"deptId":"02","name":"Lily","gender":"女","age":45,"salary":9500}
代码
import findspark findspark.init() ############################################## from pyspark.sql import SparkSession spark = SparkSession.builder \ .master("local[1]") \ .appName("RDD Demo") \ .getOrCreate(); sc = spark.sparkContext ############################################# #注意:json数组不能有[] df = spark.read.format('json') \ .load('user.json') #打印字段的类型 print(df.dtypes) df.show() #透视 df2 = df.groupBy("deptId") \ .pivot("gender") \ .sum("salary") df2.show() #条件选择 df.select("name",df.salary.between(6000,9500)).show() df.select("name","age").where(df.name.like("Smi%")).show()
- 数据透视:df2 = df.groupBy("deptId") \
.pivot("gender") \ .sum("salary")
首先按照deptId分组,其次将gender字段上的值转置,按gender字段 汇总salary的和,形成透视表
- 条件选择:df.select("name","age").where(df.name.like("Smi%"))
输出
字段类型
[('age', 'bigint'), ('deptId', 'string'), ('gender', 'string'), ('name', 'string'), ('salary', 'bigint')]
全部数据
+---+------+------+-----+------+ |age|deptId|gender| name|salary| +---+------+------+-----+------+ | 32| 01| 男| 张三| 5000| | 33| 01| 男| 李四| 6000| | 38| 01| 女| 王五| 5500| | 42| 02| 男| Jack| 7000| | 27| 02| 女|Smith| 6500| | 45| 02| 女| Lily| 9500| +---+------+------+-----+------+
数据透视表
+------+-----+-----+ |deptId| 女| 男| +------+-----+-----+ | 01| 5500|11000| | 02|16000| 7000| +------+-----+-----+
条件选择
+-----+---------------------------------------+ | name|((salary >= 6000) AND (salary <= 9500))| +-----+---------------------------------------+ | 张三| false| | 李四| true| | 王五| false| | Jack| true| |Smith| true| | Lily| true| +-----+---------------------------------------+
+-----+---+ | name|age| +-----+---+ |Smith| 27| +-----+---+