多层索引
目录
多级索引的创建
import numpy as np
import pandas as pd
这不是多级索引
a1=pd.Series([20,10,20,30], index=[('class1','female'),('class1', 'man'),('class2','female'), ('class2', 'man')])#注意索引里面需要用元组不能用列表
a1#这个不是多级索引
(class1, female) 20 (class1, man) 10 (class2, female) 20 (class2, man) 30 dtype: int64
多级索引必须像这样有层次
a=pd.Series([20,10,20,30], index=[["class1","class1","class2","class2"],["man","female","man","female"]])
a#多级索引必须像这样有层次
class1 man 20 female 10 class2 man 20 female 30 dtype: int64
多级列索引的增加和行索引一样
b=pd.DataFrame(np.random.randint(10,30,size=(4,2)),index=[["class1","class1","class2","class2"],["man","female","man","female"]],columns=["bendi","waidi"])
b#多级列索引的增加和行索引一样
bendi waidi class1 man 25 10 female 11 11 class2 man 14 24 female 25 21
显示索引名称
a.index
MultiIndex(levels=[['class1', 'class2'], ['female', 'man']], labels=[[0, 0, 1, 1], [1, 0, 1, 0]])
a.index.names=["class","gender"]#给多级索引增加名字
a
class gender class1 man 20 female 10 class2 man 20 female 30 dtype: int64
索引的单独创建
from_arrays
>>> in1=pd.MultiIndex.from_arrays([['class3','class3','class4', 'class4'], ['female', 'man','female', 'man']])
>>> c=pd.Series([20,30,35,40],index=in1)
>>> c
class3 female 20 man 30 class4 female 35 man 40 dtype: int64
from_tuples
>>> in2=pd.MultiIndex.from_tuples([['class3','female'],['class3','male'], ['class4', 'male'],['class4', 'female']],)
>>> c=pd.Series([20,30,35,40],index=in2)
>>> c
class3 female 20 male 30 class4 male 35 female 40 dtype: int64
多级索引结构转换
Series转换为dataframe
a
class gender class1 man 20 female 10 class2 man 20 female 30 dtype: int64
a.unstack()#Series转换为dataframe,把其中第二层作为横向索引,默认转化最里层
gender female man class class1 10 20 class2 30 20
a.unstack(level=0)#Series转换为dataframe,可以定义哪一层转换为横向索引
class class1 class2 gender female 10 30 man 20 20
dataframe转换为series
a.unstack().stack()#逆操作dataframe转换为series
class gender class1 female 10 man 20 class2 female 30 man 20 dtype: int64
更彻底地转换为dataframe
a2=a.reset_index()#Series转换为dataframe,这种更彻底一次性把所有纵向索引全部转化为横向
a2
class gender 0 0 class1 man 20 1 class1 female 10 2 class2 man 20 3 class2 female 30
dataframe转换为series
a2.set_index(["class","gender"])#dataframe转换为series,同时也可以设置多个横向索引转化为纵向多级索引
0 class gender class1 man 20 female 10 class2 man 20 female 30
多级索引数据的切片
a
class1 man 20 female 10 class2 man 20 female 30 dtype: int64
a["class1"]
man 20 female 10 dtype: int64
a["class1","man"]
20
切片只能从最上层往里面深入
a["man"]#多级索引只能从最上层往里面深入
切片只有在索引为有序的情况下才能进行
a["class1":"class2"]#这是按照范围取值,包含起点终点
注意范围取值只有在索引为有序的情况下才行,比如["a","b","c"],如果索引排序混乱,比如["b","a","c"]则无法进行,这个时候是需要用sort_index()进行排序才能范围取值
class gender class1 man 20 female 10 class2 man 20 female 30 dtype: int64
a"class1","class2"#多个键
class gender class1 man 20
female 10
class2 man 20
female 30
dtype: int64
b
bendi waidi class1 man 25 10 female 11 11 class2 man 14 24 female 25 21
b["bendi"]
class1 man 22 female 16 class2 man 10 female 18 Name: bendi, dtype: int32
b.loc["class1"]
bendi waidi man 22 29 female 16 24
b.loc["class1","man"]
bendi 22 waidi 29 Name: (class1, man), dtype: int32
b.loc["class1","man"]["bendi"]
25
隐式索引无所谓多层,就只有顺序
b.iloc[2]#隐式索引无所谓多层,就只有顺序
bendi 14 waidi 24 Name: (class2, man), dtype: int32
b.iloc[2,1]
14