Python数据分析:熟悉数据

来自CloudWiki
跳转至: 导航搜索

预览前几行

import pandas as pd

df = pd.read_csv(r'test1.csv')

print(df.head(2))

输出:

    编号  年龄    性别      注册时间  Unnamed: 4
0  0.0  A1  54.0  2021/8/1         NaN
1  1.0  A2  16.0  2021/8/2         NaN
>>> 

获取数据表大小

import pandas as pd

df = pd.read_csv(r'test1.csv')

print(df.shape)

输出:

(21, 5)

获取数据类型

import pandas as pd

df = pd.read_csv(r'test1.csv')

print(df.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 21 entries, 0 to 20
Data columns (total 5 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   编号          4 non-null      float64
 1   年龄          4 non-null      object 
 2   性别          4 non-null      float64
 3   注册时间        4 non-null      object 
 4   Unnamed: 4  0 non-null      float64
dtypes: float64(3), object(2)
memory usage: 968.0+ bytes
None

获取数值分布

import pandas as pd

df = pd.read_csv(r'test1.csv')

print(df.describe())

输出:

             编号         性别  Unnamed: 4
count  4.000000   4.000000         0.0
mean   1.500000  39.500000         NaN
std    1.290994  16.542874         NaN
min    0.000000  16.000000         NaN
25%    0.750000  34.750000         NaN
50%    1.500000  44.000000         NaN
75%    2.250000  48.750000         NaN
max    3.000000  54.000000         NaN