“基于ElasticSearch爬虫引擎”的版本间的差异

来自CloudWiki
跳转至: 导航搜索
基于ElasticSearch爬虫引擎
基于ElasticSearch爬虫引擎
第12行: 第12行:
  
 
爬虫格式(页面一):
 
爬虫格式(页面一):
 +
来源
 +
位置
 +
机器配置
 +
记录数
 +
错误日志数量
 +
格式:("source","Location","Email","Machine","configuration","Number of records","Number of error logs")
 +
编写logstash配置:
 +
input {
 +
  file {
 +
    path => ["/root/abc.txt"]
 +
    start_position => "beginning"
 +
  }
 +
}
 +
filter {
 +
  csv {
 +
    separator => ","
 +
    columns =>
 +
["productId","productPage","categoryOne","categoryTwo","categoryThr","productImg","productName","productPrice","productPrices","productUnit","companyName","companyUrl","address","year","tags"]
 +
  }
 +
}
 +
output {
 +
  elasticsearch {
 +
        hosts => ["master:9200"]
 +
        index => "jg"
 +
  }
 +
}
 +
  
 
爬虫格式(页面二):
 
爬虫格式(页面二):
  
 
爬虫格式(页面三):
 
爬虫格式(页面三):

2020年9月9日 (三) 13:45的版本

基于ElasticSearch爬虫引擎

csv文件上传到Elsearch
1.安装Logstash
解压软件包:tar -zxvf logstash-7.3.2.tar.gz
将安装包移动到 mv logstash-7.3.2 /usr/local/
cd /usr/local/
重命名: mv logstash-7.3.2 logstash
进入配置文件目录:/usr/local/logstash/config

logstash配置文件编写:

爬虫格式(页面一):

来源
位置
机器配置
记录数
错误日志数量

格式:("source","Location","Email","Machine","configuration","Number of records","Number of error logs") 编写logstash配置:

input {
  file {
    path => ["/root/abc.txt"]
    start_position => "beginning"
  }
}
filter {
  csv {
    separator => ","
    columns => 

["productId","productPage","categoryOne","categoryTwo","categoryThr","productImg","productName","productPrice","productPrices","productUnit","companyName","companyUrl","address","year","tags"]

  } 
}
output {
  elasticsearch {
        hosts => ["master:9200"]
        index => "jg"
  }
}


爬虫格式(页面二):

爬虫格式(页面三):