Python数据分析:熟悉数据
来自CloudWiki
预览前几行
import pandas as pd df = pd.read_csv(r'test1.csv') print(df.head(2))
输出:
编号 年龄 性别 注册时间 Unnamed: 4 0 0.0 A1 54.0 2021/8/1 NaN 1 1.0 A2 16.0 2021/8/2 NaN >>>
获取数据表大小
import pandas as pd df = pd.read_csv(r'test1.csv') print(df.shape)
输出:
(21, 5)
获取数据类型
import pandas as pd df = pd.read_csv(r'test1.csv') print(df.info())
<class 'pandas.core.frame.DataFrame'> RangeIndex: 21 entries, 0 to 20 Data columns (total 5 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 编号 4 non-null float64 1 年龄 4 non-null object 2 性别 4 non-null float64 3 注册时间 4 non-null object 4 Unnamed: 4 0 non-null float64 dtypes: float64(3), object(2) memory usage: 968.0+ bytes None
获取数值分布
import pandas as pd df = pd.read_csv(r'test1.csv') print(df.describe())
输出:
编号 性别 Unnamed: 4 count 4.000000 4.000000 0.0 mean 1.500000 39.500000 NaN std 1.290994 16.542874 NaN min 0.000000 16.000000 NaN 25% 0.750000 34.750000 NaN 50% 1.500000 44.000000 NaN 75% 2.250000 48.750000 NaN max 3.000000 54.000000 NaN