我想在我的蜘蛛应用程序中使用spring-boot-jpa,我已经拥有了maven依赖项,models,modelRepository和application.properties。在我使用注释@autowired来使用这些存储库之后,它将具有 NullPointerException 。我如何在蜘蛛中使用它们? 这是我的蜘蛛。
@Component
public class Crawler implements PageProcessor {
SimpleDateFormat sdf = new SimpleDateFormat( " yyyy-MM-dd HH:mm:ss " );
@Autowired
ArticleRepository articleRepository;
@Autowired
CategoryRepository categoryRepository;
@Autowired
NewsRepository newsRepository;
@Autowired
SourceRepository sourceRepository;
public static String content;
private Site site = Site.me().setRetryTimes(0).setSleepTime(100000000);
public void process(Page page) {
Pattern pattern = Pattern.compile("artiList\\((.*)\\)");
Matcher matcher = pattern.matcher(page.getRawText());
String json = null;
if (matcher.find()) {
json = matcher.group(1);
}
ObjectMapper mapper = new ObjectMapper();
JsonNode rootNode;
try {
rootNode = mapper.readTree(json);
JsonNode mainJson = rootNode.path("BAI6RHDKwangning");
Iterator<JsonNode> iterator = mainJson.elements();
String cur = null;
JsonNode mark = null;
List<Map<String, Object>> list = new ArrayList<>();
while (iterator.hasNext()) {
mark = iterator.next();
cur = mark.toString();
Map<String, Object> map = mapper.readValue(cur, Map.class);
list.add(map);
}
dbService(list);
} catch (IOException e) {
System.out.println(e.getMessage());
e.printStackTrace();
}
}
public void dbService(List<Map<String, Object>> list) {
try {
for (Map<String, Object> map : list) {
for (String s : map.keySet()) {
System.out.println(s + "=" + map.get(s));
System.out.println();
}
Article article = new Article();
article.setUrl(map.get("url").toString());
if (categoryRepository == null) {
System.out.println("================================================asdasasdsad");
}
Category category = categoryRepository.findByCategoryName("game");
News news = new News();
Source source = sourceRepository.findBySourceName(map.get("source").toString());
if (source == null) {
source = new Source();
source.setSourceName(map.get("source").toString());
} else {
source.setPublishCount(source.getPublishCount() + 1);
}
news.setDocid(map.get("docid").toString());
news.setCommentCount(Integer.valueOf(map.get("commentCount").toString()));
news.setDigest(map.get("digest").toString());
news.setHasImg(Integer.valueOf(map.get("hasImg").toString()));
news.setImgsrc(map.get("imgsrc").toString());
news.setPriority(Integer.valueOf(map.get("priority").toString()));
news.setPtime(sdf.parse(map.get("ptime").toString()));
news.setTitle(map.get("title").toString());
news.setArticleId(article);
news.setCategoryCode(category);
news.setArticleId(article);
news.setSourceId(source);
articleRepository.save(article);
sourceRepository.save(source);
newsRepository.save(news);
}
} catch (ParseException e) {
System.out.println(e.getMessage() );
e.printStackTrace();
}
}
public Site getSite() {
return site;
}
public void runSpider() {
Spider.create(new Crawler())
.addUrl("http://3g.163.com/touch/reconstruct/article/list/BAI6RHDKwangning/0-1.html")
.thread(5)
.run();
}
}
这是我的MainApplication。
@SpringBootApplication
public class NethardApplication implements CommandLineRunner{
@Autowired
Crawler crawler;
public static void main(String[] args) {
SpringApplication.run(NethardApplication.class, args);
}
@Override
public void run(String... args) {
crawler.runSpider();
}
}
10:05:34.602 [pool-1-thread-1] ERROR us.codecraft.webmagic.Spider - process request Request{url='http://3g.163.com/touch/reconstruct/article/list/BAI6RHDKwangning/0-1.html', method='null', extras=null, priority=0, headers={}, cookies={}} error
java.lang.NullPointerException: null
at com.cmh.Crawler.dbService(Crawler.java:90)
at com.cmh.Crawler.process(Crawler.java:70)
at us.codecraft.webmagic.Spider.onDownloadSuccess(Spider.java:414)
at us.codecraft.webmagic.Spider.processRequest(Spider.java:406)
at us.codecraft.webmagic.Spider.access$000(Spider.java:61)
at us.codecraft.webmagic.Spider$1.run(Spider.java:320)
at us.codecraft.webmagic.thread.CountableThreadPool$1.run(CountableThreadPool.java:74)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
10:05:34.603 [main] INFO us.codecraft.webmagic.Spider - Spider 3g.163.com closed! 1 pages downloaded.
答案 0 :(得分:3)
当您使用new
创建对象时,Spring不会管理该对象,因此不会发生Autowire。
但是你已经在runSpider
方法中有一个由Spring管理的Crawler bean,你可以注入那个,例如:
Spider.create(this).addUrl("http://3g.163.com/touch/reconstruct/article/list/BAI6RHDKwangning/0-1.html")
.thread(5)
.run();
答案 1 :(得分:1)
为getApplicationContext创建类
public class ApplicationContextProvider implements ApplicationContextAware {
private static ApplicationContext context;
public static ApplicationContext getApplicationContext(){
return context;
}
@Override
public void setApplicationContext(ApplicationContext applicationContext) throws BeansException {
context = applicationContext;
}
}
在您的蜘蛛应用程序
ArticleRepository articleRepository= ApplicationContextProvider.getApplicationContext().getBean(ArticleRepository.class);
CategoryRepository categoryRepository= ApplicationContextProvider.getApplicationContext().getBean(CategoryRepository.class);
NewsRepository newsRepository= ApplicationContextProvider.getApplicationContext().getBean(NewsRepository.class);
SourceRepository sourceRepository= ApplicationContextProvider.getApplicationContext().getBean(SourceRepository.class);