数据分析案例:借贷平台

来自CloudWiki
58.56.20.138讨论2020年11月11日 (三) 03:02的版本 数据分析
跳转至: 导航搜索

数据分析

[[1]]

    • 2.上传数据**

创建指定文件夹,将数据上传至hdfs指定目录/college/下

58.56.20.138 hadoop fs -mkdir -p /college hadoop fs -put /root/college/bike.csv /college/ hadoop fs -ls /college/ 58.56.20.138

3.建库建表

58.56.20.138 create database hive; 58.56.20.138

58.56.20.138 create table bike( duration int, startdate timestamp, enddate timestamp, startnum int, startstation string, endnum int, endstation string, bikenum string, type string) row format delimited fields terminated by ','; 58.56.20.138

58.56.20.138 create table bike( tid int , author string, score string, grade string, title string, content string) row format delimited fields terminated by ','; 58.56.20.138

58.56.20.138 create table bike( duration int, startdate string, enddate string, startnum int, startstation string, endnum string, endstation string, bikenum timestamp, type timestamp,

      string,
      string,
      timestamp,
      timestamp,
      string)

row format delimited fields terminated by ' '; 58.56.20.138

58.56.20.138 create table train_format( user_id int, age_range int, gender int, merchant_id int, label int, activity_log string) row format delimited fields terminated by ','; 58.56.20.138

    • 4.数据导入**

本地数据导入 local意思是本地数据“/root/college/bike.csv”

58.56.20.138 LOAD DATA LOCAL INPATH '数据的路径' INTO TABLE 表名; 58.56.20.138

统计表数据下入本地目录中,local意思是本地数据“

58.56.20.138 INSERT OVERWRITE LOCAL DIRECTORY '路径’ ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t’ SELECT 字段1, 字段2, 字段3 FROM 表名 ; 58.56.20.138

![1595473261254](C:\Users\DELL\AppData\Roaming\Typora\typora-user-images\1595473261254.png)

58.56.20.138 INSERT OVERWRITE LOCAL DIRECTORY'/root/college001/01/' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' SELECT count(distinct bikenum) FROM bike; 58.56.20.138

![1595473333500](C:\Users\DELL\AppData\Roaming\Typora\typora-user-images\1595473333500.png)

58.56.20.138 INSERT OVERWRITE LOCAL DIRECTORY 'root/college001/' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' SELECT count(distinct bikenum) FROM bike; 58.56.20.138

![1594389021724](C:\Users\DELL\AppData\Roaming\Typora\typora-user-images\1594389021724.png)

用join 或者左右,连接

58.56.20.138 select startstation,count(*) as total from bike where endstation = group by total order by total desc limit 1; 58.56.20.138