运行代码程序

来自CloudWiki
跳转至: 导航搜索

WebMagic的核心组件为PageProcessor与Pipeline,通过上述步骤的讲解,读者应该可以定制这两个组件了,而调用这两个核心组件是通过Spider类,简单的代码如下:

Spider.create(new CrawlJob())

     .addUrl(URL_START)
     .addPipeline(new   PipelineJob()).thread(5).run();

由于要将Java程序导出为jar文件,并且需要实现无参传递(使用默认保存路径)和有参传递(使用指定保存路径)两种形式,因此使用main方法的代码如下:

public static void main(String[] args) throws Exception {

     String   URL_START = "http://search.51job.com/list/000000%252C00,000000,0000,00,9,99,%25E4%25BA%2591%25E8%25AE%25A1%25E7%25AE%2597,2,1.html?lang=c&degreefrom=99&stype=1&workyear=99&cotype=99&jobterm=99&companysize=99&radius=-1&address=&lonlat=&postchannel=&list_type=&ord_field=&curr_page=&dibiaoid=0&landmark=&welfare=";
     if   (args.length==0){
     Spider.create(new   CrawlJob())
     .addUrl(URL_START)
     .addPipeline(new   PipelineJob()).thread(5).run();
     }else{
            Spider.create(new   CrawlJob())
            .addUrl(URL_START)
            .addPipeline(new   PipelineJob(args[0])).thread(5).run();
                   }

}