“基于ElasticSearch爬虫引擎”的版本间的差异
(→基于ElasticSearch爬虫引擎) |
|||
第1行: | 第1行: | ||
== 基于ElasticSearch爬虫引擎 == | == 基于ElasticSearch爬虫引擎 == | ||
+ | ===安装部署=== | ||
csv文件上传到Elsearch<br> | csv文件上传到Elsearch<br> | ||
1.安装Logstash<br> | 1.安装Logstash<br> | ||
第11行: | 第12行: | ||
'''logstash配置文件编写:''' | '''logstash配置文件编写:''' | ||
− | 爬虫格式(页面一) | + | ===爬虫格式(页面一)=== |
− | + | 来源 | |
− | + | ||
− | + | 位置 | |
− | + | ||
− | + | 机器配置 | |
+ | |||
+ | 记录数 | ||
+ | |||
+ | 错误日志数量 | ||
+ | |||
格式:("source","Location","Email","Machine","configuration","Number of records","Number of error logs")<br> | 格式:("source","Location","Email","Machine","configuration","Number of records","Number of error logs")<br> | ||
编写logstash配置abc-pc.conf:<br> | 编写logstash配置abc-pc.conf:<br> | ||
第43行: | 第49行: | ||
效果图:[[文件:2020-09-09 215741.png]] | 效果图:[[文件:2020-09-09 215741.png]] | ||
− | 爬虫格式(页面二) | + | ===爬虫格式(页面二)=== |
− | + | 进行中数: | |
− | 已完成数: | + | |
− | + | 已完成数: | |
− | + | ||
− | + | 待完成数: | |
− | + | ||
+ | 资源总进度: | ||
+ | |||
+ | 任务名称: | ||
+ | |||
+ | 任务状态: | ||
格式:("In progress:","Completed number:","Number to be completed:","Total progress of resources:","Task name","Task status:")<br> | 格式:("In progress:","Completed number:","Number to be completed:","Total progress of resources:","Task name","Task status:")<br> | ||
第75行: | 第86行: | ||
执行logstash: ./bin/logstash -f config/abc-pcwcjd.conf --path.data=/root/pcjd | 执行logstash: ./bin/logstash -f config/abc-pcwcjd.conf --path.data=/root/pcjd | ||
效果图:[[文件:2020-09-09 231350.png]] | 效果图:[[文件:2020-09-09 231350.png]] | ||
− | |||
− | + | ===爬虫格式(页面三)=== | |
− | + | ||
− | + | 阿里巴巴错误分布: | |
− | + | ||
− | + | 中国制造错误分布: | |
+ | |||
+ | 敦煌错误分布: | ||
+ | |||
+ | 爬虫任务分布: | ||
+ | |||
+ | 爬虫状态分布: | ||
格式:("AlibabaError","made in chinaError","Global in progressError","DunhuangError","Task error","Status error")<br> | 格式:("AlibabaError","made in chinaError","Global in progressError","DunhuangError","Task error","Status error")<br> | ||
编写logstash配置abc-cw.conf:<br> | 编写logstash配置abc-cw.conf:<br> | ||
− | input { | + | |
+ | <nowiki>input { | ||
file { | file { | ||
path => ["/root/bbb.txt"] | path => ["/root/bbb.txt"] | ||
第102行: | 第119行: | ||
index => "pc-cw" | index => "pc-cw" | ||
} | } | ||
− | } | + | }</nowiki> |
效果图:<br> [[文件:2020-09-10 000338.png]] | 效果图:<br> [[文件:2020-09-10 000338.png]] |
2020年9月20日 (日) 02:03的版本
基于ElasticSearch爬虫引擎
安装部署
csv文件上传到Elsearch
1.安装Logstash
解压软件包:tar -zxvf logstash-7.3.2.tar.gz
将安装包移动到 mv logstash-7.3.2 /usr/local/
cd /usr/local/
重命名: mv logstash-7.3.2 logstash
进入配置文件目录:/usr/local/logstash/config
logstash配置文件编写:
爬虫格式(页面一)
来源
位置
机器配置
记录数
错误日志数量
格式:("source","Location","Email","Machine","configuration","Number of records","Number of error logs")
编写logstash配置abc-pc.conf:
input { file { path => ["/root/ccc.txt"] start_position => "beginning" } } filter { csv { separator => "," columns => ["source","Location","Email","Machineconfiguration","Numberofrecords","Numberoferrorlogs"] } } output { elasticsearch { hosts => ["master:9200"] index => "pachong" } }
执行logstash: ./bin/logstash -f config/abc.conf --path.data=/root/log
效果图:
爬虫格式(页面二)
进行中数:
已完成数:
待完成数:
资源总进度:
任务名称:
任务状态:
格式:("In progress:","Completed number:","Number to be completed:","Total progress of resources:","Task name","Task status:")
编写logstash配置abc-pcwcjd.conf:
input { file { path => ["/root/ooo.txt"] start_position => "beginning" } } filter { csv { separator => "," columns => ["In progress:","Completed number:","Number to be completed:","Total progress of resources:","Task name","Task status:"] } } output { elasticsearch { hosts => ["master:9200"] index => "pachongzhuangtai" } }
执行logstash: ./bin/logstash -f config/abc-pcwcjd.conf --path.data=/root/pcjd 效果图:
爬虫格式(页面三)
阿里巴巴错误分布:
中国制造错误分布:
敦煌错误分布:
爬虫任务分布:
爬虫状态分布:
格式:("AlibabaError","made in chinaError","Global in progressError","DunhuangError","Task error","Status error")
编写logstash配置abc-cw.conf:
input { file { path => ["/root/bbb.txt"] start_position => "beginning" } } filter { csv { separator => "," columns => ["AlibabaError","made in chinaError","Global in progressError","DunhuangError","Task error","Status error"] } } output { elasticsearch { hosts => ["master:9200"] index => "pc-cw" } }