drop table if exists words; create table words(textline string); load data local inpath '/opt/test.txt' into table words; set hive.cli.print.header=true; select word,count(*) as wordcount from (select explode(split(textline," ")) as word from words) tmp group by word;
hive -f wordcount.hql
Logging initialized using configuration in jar:file:/usr/local/apache-hive-1.2.1-bin/lib/hive-common-1.2.1.jar!/hive-log4j.properties OK Time taken: 0.945 seconds OK Time taken: 1.252 seconds Loading data to table default.words Table default.words stats: [numFiles=1, totalSize=48] OK Time taken: 1.058 seconds Query ID = root_20200627101421_cb24470f-8105-4dd2-8fc5-bb8e32c014e1 Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks not specified. Estimated from input data size: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapreduce.job.reduces=<number> Starting Job = job_1593205940394_0003, Tracking URL = http://master:8088/proxy/application_1593205940394_0003/ Kill Command = /usr/local/hadoop-2.6.5/bin/hadoop job -kill job_1593205940394_0003 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1 2020-06-27 10:14:33,980 Stage-1 map = 0%, reduce = 0% 2020-06-27 10:14:43,523 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.52 sec 2020-06-27 10:14:51,954 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 4.07 sec MapReduce Total cumulative CPU time: 4 seconds 70 msec Ended Job = job_1593205940394_0003 MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 4.07 sec HDFS Read: 7404 HDFS Write: 52 SUCCESS Total MapReduce CPU Time Spent: 4 seconds 70 msec OK word wordcount I 3 MapReduce 1 a 1 am 1 hadoop 1 learn 2 student 1 Time taken: 31.612 seconds, Fetched: 7 row(s)