多层索引

来自CloudWiki
跳转至: 导航搜索


多级索引的创建

import numpy as np

import pandas as pd

这不是多级索引

a1=pd.Series([20,10,20,30], index=[('class1','female'),('class1', 'man'),('class2','female'), ('class2', 'man')])#注意索引里面需要用元组不能用列表

a1#这个不是多级索引

(class1, female)    20
(class1, man)       10
(class2, female)    20
(class2, man)       30
dtype: int64

多级索引必须像这样有层次

a=pd.Series([20,10,20,30], index=[["class1","class1","class2","class2"],["man","female","man","female"]])

a#多级索引必须像这样有层次

class1  man       20
        female    10
class2  man       20
        female    30
dtype: int64

多级列索引的增加和行索引一样

b=pd.DataFrame(np.random.randint(10,30,size=(4,2)),index=[["class1","class1","class2","class2"],["man","female","man","female"]],columns=["bendi","waidi"])

b#多级列索引的增加和行索引一样


		bendi 	waidi
class1 	man 	25 	10
        female 	11 	11
class2 	man 	14 	24
        female 	25 	21

显示索引名称

a.index

MultiIndex(levels=[['class1', 'class2'], ['female', 'man']],
           labels=[[0, 0, 1, 1], [1, 0, 1, 0]])

a.index.names=["class","gender"]#给多级索引增加名字

a

class   gender
class1  man       20
        female    10
class2  man       20
        female    30
dtype: int64

索引的单独创建

from_arrays

>>> in1=pd.MultiIndex.from_arrays([['class3','class3','class4', 'class4'], ['female', 'man','female', 'man']])

>>> c=pd.Series([20,30,35,40],index=in1)

>>> c

class3  female    20
        man       30
class4  female    35
        man       40
dtype: int64

from_tuples

>>> in2=pd.MultiIndex.from_tuples([['class3','female'],['class3','male'], ['class4', 'male'],['class4', 'female']],)

>>> c=pd.Series([20,30,35,40],index=in2)

>>> c

class3  female    20
        male      30
class4  male      35
        female    40
dtype: int64

多级索引结构转换

Series转换为dataframe

a

class   gender
class1  man       20
        female    10
class2  man       20
        female    30
dtype: int64

a.unstack()#Series转换为dataframe,把其中第二层作为横向索引,默认转化最里层

gender 	female 	man
class 		
class1 	10 	20
class2 	30 	20

a.unstack(level=0)#Series转换为dataframe,可以定义哪一层转换为横向索引

class 	class1 	class2
gender 		
female 	10 	30
man 	20 	20

dataframe转换为series

a.unstack().stack()#逆操作dataframe转换为series

class   gender
class1  female    10
        man       20
class2  female    30
        man       20
dtype: int64


更彻底地转换为dataframe

a2=a.reset_index()#Series转换为dataframe,这种更彻底一次性把所有纵向索引全部转化为横向

a2

class 	gender 	0
0 	class1 	man 	20
1 	class1 	female 	10
2 	class2 	man 	20
3 	class2 	female 	30

dataframe转换为series

a2.set_index(["class","gender"])#dataframe转换为series,同时也可以设置多个横向索引转化为纵向多级索引

	0
class 	gender 	
class1 	man 	20
female 	10
class2 	man 	20
female 	30

多级索引数据的切片

a

class1  man       20
        female    10
class2  man       20
        female    30
dtype: int64

a["class1"]

man       20
female    10
dtype: int64

a["class1","man"]

20

切片只能从最上层往里面深入

a["man"]#多级索引只能从最上层往里面深入

切片只有在索引为有序的情况下才能进行

a["class1":"class2"]#这是按照范围取值,包含起点终点

注意范围取值只有在索引为有序的情况下才行,比如["a","b","c"],如果索引排序混乱,比如["b","a","c"]则无法进行,这个时候是需要用sort_index()进行排序才能范围取值

class   gender
class1  man       20
        female    10
class2  man       20
        female    30
dtype: int64

a"class1","class2"#多个键

class gender class1 man 20

       female    10

class2 man 20

       female    30

dtype: int64

b

bendi 	waidi
class1 	man 	25 	10
female 	11 	11
class2 	man 	14 	24
female 	25 	21

b["bendi"]

class1  man       22
        female    16
class2  man       10
        female    18
Name: bendi, dtype: int32

b.loc["class1"]

bendi 	waidi
man 	22 	29
female 	16 	24

b.loc["class1","man"]

bendi    22
waidi    29
Name: (class1, man), dtype: int32

b.loc["class1","man"]["bendi"]

25

隐式索引无所谓多层,就只有顺序

b.iloc[2]#隐式索引无所谓多层,就只有顺序

bendi    14
waidi    24
Name: (class2, man), dtype: int32

b.iloc[2,1]

14