查看“基于ElasticSearch爬虫引擎”的源代码
←
基于ElasticSearch爬虫引擎
跳转至:
导航
,
搜索
因为以下原因,您没有权限编辑本页:
您所请求的操作仅限于该用户组的用户使用:
用户
您可以查看与复制此页面的源代码。
== 基于ElasticSearch爬虫引擎 == csv文件上传到Elsearch<br> 1.安装Logstash<br> 解压软件包:tar -zxvf logstash-7.3.2.tar.gz<br> 将安装包移动到 mv logstash-7.3.2 /usr/local/<br> cd /usr/local/<br> 重命名: mv logstash-7.3.2 logstash<br> 进入配置文件目录:/usr/local/logstash/config<br> '''logstash配置文件编写:''' 爬虫格式(页面一): 来源 位置 机器配置 记录数 错误日志数量 格式:("source","Location","Email","Machine","configuration","Number of records","Number of error logs")<br> 编写logstash配置abc-pc.conf:<br> input { file { path => ["/root/ccc.txt"] start_position => "beginning" } } filter { csv { separator => "," columns => ["source","Location","Email","Machineconfiguration","Numberofrecords","Numberoferrorlogs"] } } output { elasticsearch { hosts => ["master:9200"] index => "pachong" } } 执行logstash: ./bin/logstash -f config/abc.conf --path.data=/root/log 效果图:[[文件:2020-09-09 215741.png]] 爬虫格式(页面二): 进行中数: 已完成数: 待完成数: 资源总进度: 任务名称: 任务状态: 格式:("In progress:","Completed number:","Number to be completed:","Total progress of resources:","Task name","Task status:")<br> 编写logstash配置abc-pcwcjd.conf:<br> input { file { path => ["/root/ooo.txt"] start_position => "beginning" } } filter { csv { separator => "," columns => ["In progress:","Completed number:","Number to be completed:","Total progress of resources:","Task name","Task status:"] } } output { elasticsearch { hosts => ["master:9200"] index => "pachongzhuangtai" } } 执行logstash: ./bin/logstash -f config/abc-pcwcjd.conf --path.data=/root/pcjd 效果图:[[文件:2020-09-09 231350.png]] 爬虫格式(页面三): 阿里巴巴错误分布: 中国制造错误分布: 敦煌错误分布: 爬虫任务分布: 爬虫状态分布: 格式:("AlibabaError","made in chinaError","Global in progressError","DunhuangError","Task error","Status error")<br> 编写logstash配置abc-cw.conf:<br> input { file { path => ["/root/bbb.txt"] start_position => "beginning" } } filter { csv { separator => "," columns => ["AlibabaError","made in chinaError","Global in progressError","DunhuangError","Task error","Status error"] } } output { elasticsearch { hosts => ["master:9200"] index => "pc-cw" } } 效果图:<br> [[文件:2020-09-10 000338.png]]
返回至
基于ElasticSearch爬虫引擎
。
导航菜单
个人工具
登录
命名空间
页面
讨论
变种
视图
阅读
查看源代码
查看历史
更多
搜索
导航
首页
最近更改
随机页面
帮助
工具
链入页面
相关更改
特殊页面
页面信息