出现警告消息时如何编码?

时间:2017-07-16 18:41:24

标签: r string unicode encoding utf-8

我有" r"从网站上获取必要的元素。

我认为我正确地进行了编码。但它似乎是一个错误和编码错误。

我还使用我的代码上传错误。

你能告诉我出错吗?

告诉我如何解决错误的事情。

library(httr)
library(rvest)
library(stringr)

list.url <- "https://section.blog.naver.com/sub/SearchBlog.nhn?type=post&option.keyword=%EA%B3%B5%EB%B6%80&term=period&option.startDate=2017-07-15&option.endDate=2017-07-16&option.page.currentPage=1&option.orderBy=date"
titles = c()

for(i in 1:30){
  url = modify_url(list.url, query=list(option.page.currentPage=i))          
  h.list = read_html(url, encoding = 'utf-8') ## url Change coding

  title.links = html_nodes(h.list, 'h5 a')
  article.links = html_attr(title.links, 'href')

  article.links = unique(article.links) 
  article.links = article.links[grep('http://blog.naver.com', article.links)] 

  for(link in article.links){
    h = read_html(link, encoding='CP949')
    h = read_html(paste0('http://blog.naver.com', html_attr(html_nodes(h, '#mainFrame'), 'src')), encoding = 'CP949')

#Get into the site, get the title
    title = str_trim(repair_encoding(html_text(html_nodes   (h, '.pcol1.itemSubjectBoldfont')), from = "UTF-8"))
    titles = c(titles, title)

    print(link)

  }
}

警告讯息:

1: In stringi::stri_conv(x, from = from) :
  the Unicode codepoint \U00008bb2 cannot be converted to destination encoding
2: In stringi::stri_conv(x, from = from) :
  the Unicode codepoint \U00007231 cannot be converted to destination encoding
3: In stringi::stri_conv(x, from = from) :
  the Unicode codepoint \U00002b50 cannot be converted to destination encoding
4: In stringi::stri_conv(x, from = from) :
  the Unicode codepoint \U0000fe0f cannot be converted to destination encoding

键入&#34; guess_encoding(titles)&#34;

时的结果
      encoding language confidence
1        UTF-8                1.00
2     UTF-16BE                0.10
3     UTF-16LE                0.10
4 windows-1255       he       0.08
5 windows-1255       he       0.06
6   IBM420_ltr       ar       0.03
7   IBM420_rtl       ar       0.02

键入&#34; guess_encoding(title)&#34;

时的结果
       encoding language confidence
1        UTF-8                1.00
2 windows-1255       he       0.11
3     UTF-16BE                0.10
4     UTF-16LE                0.10
5 windows-1250       hu       0.09
6 windows-1252       fr       0.07
7 windows-1255       he       0.03

0 个答案:

没有答案