“基于ElasticSearch爬虫引擎”的版本间的差异
来自CloudWiki
(→基于ElasticSearch爬虫引擎) |
(→基于ElasticSearch爬虫引擎) |
||
第12行: | 第12行: | ||
爬虫格式(页面一): | 爬虫格式(页面一): | ||
+ | 来源 | ||
+ | 位置 | ||
+ | 机器配置 | ||
+ | 记录数 | ||
+ | 错误日志数量 | ||
+ | 格式:("source","Location","Email","Machine","configuration","Number of records","Number of error logs") | ||
+ | 编写logstash配置: | ||
+ | input { | ||
+ | file { | ||
+ | path => ["/root/abc.txt"] | ||
+ | start_position => "beginning" | ||
+ | } | ||
+ | } | ||
+ | filter { | ||
+ | csv { | ||
+ | separator => "," | ||
+ | columns => | ||
+ | ["productId","productPage","categoryOne","categoryTwo","categoryThr","productImg","productName","productPrice","productPrices","productUnit","companyName","companyUrl","address","year","tags"] | ||
+ | } | ||
+ | } | ||
+ | output { | ||
+ | elasticsearch { | ||
+ | hosts => ["master:9200"] | ||
+ | index => "jg" | ||
+ | } | ||
+ | } | ||
+ | |||
爬虫格式(页面二): | 爬虫格式(页面二): | ||
爬虫格式(页面三): | 爬虫格式(页面三): |
2020年9月9日 (三) 13:45的版本
基于ElasticSearch爬虫引擎
csv文件上传到Elsearch
1.安装Logstash
解压软件包:tar -zxvf logstash-7.3.2.tar.gz
将安装包移动到 mv logstash-7.3.2 /usr/local/
cd /usr/local/
重命名: mv logstash-7.3.2 logstash
进入配置文件目录:/usr/local/logstash/config
logstash配置文件编写:
爬虫格式(页面一):
来源 位置 机器配置 记录数 错误日志数量
格式:("source","Location","Email","Machine","configuration","Number of records","Number of error logs") 编写logstash配置:
input { file { path => ["/root/abc.txt"] start_position => "beginning" } } filter { csv { separator => "," columns =>
["productId","productPage","categoryOne","categoryTwo","categoryThr","productImg","productName","productPrice","productPrices","productUnit","companyName","companyUrl","address","year","tags"]
} } output { elasticsearch { hosts => ["master:9200"] index => "jg" } }
爬虫格式(页面二):
爬虫格式(页面三):