基于ElasticSearch爬虫引擎

来自CloudWiki
Cloud17讨论 | 贡献2020年9月20日 (日) 02:03的版本
跳转至: 导航搜索

基于ElasticSearch爬虫引擎

安装部署

csv文件上传到Elsearch
1.安装Logstash
解压软件包:tar -zxvf logstash-7.3.2.tar.gz
将安装包移动到 mv logstash-7.3.2 /usr/local/
cd /usr/local/
重命名: mv logstash-7.3.2 logstash
进入配置文件目录:/usr/local/logstash/config

logstash配置文件编写:

爬虫格式(页面一)

来源

位置

机器配置

记录数

错误日志数量

格式:("source","Location","Email","Machine","configuration","Number of records","Number of error logs")
编写logstash配置abc-pc.conf:

input {
 file {
   path => ["/root/ccc.txt"]
   start_position => "beginning"
  }
}
filter {
  csv {
    separator => ","
    columns => ["source","Location","Email","Machineconfiguration","Numberofrecords","Numberoferrorlogs"]
}
}
output {
  elasticsearch {
        hosts => ["master:9200"]
        index => "pachong"
  }
}


执行logstash: ./bin/logstash -f config/abc.conf --path.data=/root/log 效果图:2020-09-09 215741.png

爬虫格式(页面二)

进行中数:

已完成数:

待完成数:

资源总进度:

任务名称:

任务状态:

格式:("In progress:","Completed number:","Number to be completed:","Total progress of resources:","Task name","Task status:")
编写logstash配置abc-pcwcjd.conf:

input {
  file {
    path => ["/root/ooo.txt"]
    start_position => "beginning"
  }
}
filter {
  csv {
    separator => ","
    columns => ["In progress:","Completed number:","Number to be completed:","Total progress of resources:","Task name","Task status:"]
}
}
output {
  elasticsearch {
        hosts => ["master:9200"]
        index => "pachongzhuangtai"
  }
}

执行logstash: ./bin/logstash -f config/abc-pcwcjd.conf --path.data=/root/pcjd 效果图:2020-09-09 231350.png

爬虫格式(页面三)

阿里巴巴错误分布:

中国制造错误分布:

敦煌错误分布:

爬虫任务分布:

爬虫状态分布:

格式:("AlibabaError","made in chinaError","Global in progressError","DunhuangError","Task error","Status error")
编写logstash配置abc-cw.conf:

input {
  file {
    path => ["/root/bbb.txt"]
    start_position => "beginning"
 }
 }
 filter {
   csv {
     separator => ","
     columns => ["AlibabaError","made in chinaError","Global in progressError","DunhuangError","Task error","Status error"]
 }
 }
 output {
   elasticsearch {
         hosts => ["master:9200"]
         index => "pc-cw"
   }
 }

效果图:
2020-09-10 000338.png