Hive离线选品推荐

来自CloudWiki
Heart讨论 | 贡献2020年8月25日 (二) 14:28的版本 Hive离线选品推荐
跳转至: 导航搜索

Hive离线选品推荐

1.我们先展示一条源数据,以逗号分割
字段意思:商品页数,一级分类,二级分类,三级分类,商品图片,商品名称,商品价格,商品价格范围,商品单位,公司名称,公司网址,地址,"几年老店",标签 字段名:productId,productPage,categoryOne,categoryTwo,categoryThr,productImg,productName,productPrice,productPrices,productUnit,companyName,companyUrl,address,year,tags

1,1,Apparel_Textiles & Accessories,Apparel,Apparel Design Services,//s.alicdn.com/@sc01/kf/H50cfdfc8a6c4445ab6f3a926a2542c00t.jpg_300x300.jpg,embroidery 
digitizing 24h on line,1,$1.00-$2.00,Unit,Shen Zhen Happitoo Textile Co._ Ltd.,//happitoo.en.alibaba.com/company_profile.html#top-nav-bar,China,1,Apparel 
Design Apparel_Textessories Services;

首先在hive创建表名称为"data"

create table data(
    productId string,
    productPage int,
    categoryOne string,
    categoryTwo string,
    categoryThr string,
    productImg string,	
    productName string,
    productPrice string,
    productPrices string,
    productUnit string,
    companyName string,
    companyUrl string,
    address string,
    year int,
    tags string)
    row format delimited fields terminated by ',';

2020-08-25 213446.png

然后我们将本地数据导入到hive中:

load data local inpath '/root/sj.txt' into table data;

查看导入的数据:

select * from data;

创建一个Grade表,用于等级分类调用:

create table Grade()
insert into Grade values("low","middlw","high");

接下来写shell脚本对数据价格进行分类:

#!/usr/bin/env bash
HIVE_HOME=/root/software/apache-hive-2.1.1-bin
#test1=`hive -S -e "select count(*) from data;"`
#echo ${test1}
test3=`hive -S -e "create table aaa_data as select *,rank() over(order by productPrice asc) as pm from data;"`
echo ${test3}
test4=`hive -S -e "select count(*)*0.3 from aaa_data;"`
test5=`hive -S -e "select count(*)*0.6 from aaa_data;"`
testlow=`hive -S -e "select low from Grade;"`
testmiddlw=`hive -S -e "select middlw from Grade;"`
testhigh=`hive -S -e "select high from Grade;"`
echo ${testlow}
echo ${testmiddlw}
echo ${testhigh}
echo ${test1}
echo ${test4}
echo ${test5}
#test6=`hive -S -e "select a,b,h,case
#    when pm<=${test4} then ${testlow}
#    when pm>=${test4} and pm<${test5} then ${testmiddlw}
#    when pm>=${test5} then ${testhigh}
#    else null end as Praaa from aaa_info,Grade;"`
#echo ${test6}
output=`hive -S -e "create table data_output as select 
productId,productPage,categoryOne,categoryTwo,categoryThr,productImg,productName,productPrice,productPrices,productUnit,companyName,companyUrl,address,year,tags,case
   when pm<=${test4} then ${testlow}
   when pm>=${test4} and pm<${test5} then ${testmiddlw}
   when pm>=${test5} then ${testhigh}
   else null end as Praaa from aaa_data,Grade;"`
echo ${output}