Question

我正在进行网页抓取。

以下是我使用的代码。

我对评论写了一些评论。

library(httr)
library(rvest)
library(stringr)


# Bulletin board url
List.of.questions.url<- 'http://kin.naver.com/qna/list.nhn?m=noanswer&dirId=70108'

# Vector to store title and body
answers <- c()

#  get the posts from page 1 to page 2.
for(i in 1:2){
  url <- modify_url(List.of.questions.url, query=list(page=i))  
  list <- read_html(url, encoding = 'utf-8') #I think I encoded, but I'm getting an error.


  # Gets the url of the post.
  # TLS = title.links, CLS = content.links 
  TLS <- html_nodes(list, '.basic1 dt a') 
  CLS <- html_attr(TLS, 'href')
  CLS <- paste0("http://kin.naver.com",CLS) 

  #Gets the required properties.
  for(link in CLS){
    h <- read_html(link)  

    # answer    
    answer <- html_text(html_nodes(h, '#contents_layer_1'))
    answer <- str_trim(repair_encoding(answer)) #I think I encoded, but I'm getting an error.
    answers<-c(answers,answer)

    print(link)

  }
}

但是，在抓取时会发生此错误。

也许是关于编码。

（但正如我在评论中写的那样，我认为我正确地进行了编码。）

[1] "http://kin.naver.com/qna/detail.nhn?d1id=7&dirId=70111&docId=280474910"
Error: No guess has more than 50% confidence
In addition: There were 43 warnings (use warnings() to see them)  
> warnings()

1: In stringi::stri_conv(x, from = from) :
  the Unicode codepoint \U000000a0 cannot be converted to destination encoding
2: In stringi::stri_conv(x, from = from) :
  the Unicode codepoint \U000000a0 cannot be converted to destination encoding
3: In stringi::stri_conv(x, from = from) :
  the Unicode codepoint \U000000a0 cannot be converted to destination encoding
4: In stringi::stri_conv(x, from = from) :
  the Unicode codepoint \U000000a0 cannot be converted to destination encoding
5: In stringi::stri_conv(x, from = from) :  
#All the same contents, so omitted

我该如何解决？

感谢您的建议

此错误是编码错误吗？我该如何解决？

0 个答案: