我有" r"从网站上获取必要的元素。
我认为我正确地进行了编码。但它似乎是一个错误和编码错误。
我还使用我的代码上传错误。
你能告诉我出错吗?
告诉我如何解决错误的事情。
library(httr)
library(rvest)
library(stringr)
list.url <- "https://section.blog.naver.com/sub/SearchBlog.nhn?type=post&option.keyword=%EA%B3%B5%EB%B6%80&term=period&option.startDate=2017-07-15&option.endDate=2017-07-16&option.page.currentPage=1&option.orderBy=date"
titles = c()
for(i in 1:30){
url = modify_url(list.url, query=list(option.page.currentPage=i))
h.list = read_html(url, encoding = 'utf-8') ## url Change coding
title.links = html_nodes(h.list, 'h5 a')
article.links = html_attr(title.links, 'href')
article.links = unique(article.links)
article.links = article.links[grep('http://blog.naver.com', article.links)]
for(link in article.links){
h = read_html(link, encoding='CP949')
h = read_html(paste0('http://blog.naver.com', html_attr(html_nodes(h, '#mainFrame'), 'src')), encoding = 'CP949')
#Get into the site, get the title
title = str_trim(repair_encoding(html_text(html_nodes (h, '.pcol1.itemSubjectBoldfont')), from = "UTF-8"))
titles = c(titles, title)
print(link)
}
}
警告讯息:
1: In stringi::stri_conv(x, from = from) :
the Unicode codepoint \U00008bb2 cannot be converted to destination encoding
2: In stringi::stri_conv(x, from = from) :
the Unicode codepoint \U00007231 cannot be converted to destination encoding
3: In stringi::stri_conv(x, from = from) :
the Unicode codepoint \U00002b50 cannot be converted to destination encoding
4: In stringi::stri_conv(x, from = from) :
the Unicode codepoint \U0000fe0f cannot be converted to destination encoding
键入&#34; guess_encoding(titles)&#34;
时的结果 encoding language confidence
1 UTF-8 1.00
2 UTF-16BE 0.10
3 UTF-16LE 0.10
4 windows-1255 he 0.08
5 windows-1255 he 0.06
6 IBM420_ltr ar 0.03
7 IBM420_rtl ar 0.02
键入&#34; guess_encoding(title)&#34;
时的结果 encoding language confidence
1 UTF-8 1.00
2 windows-1255 he 0.11
3 UTF-16BE 0.10
4 UTF-16LE 0.10
5 windows-1250 hu 0.09
6 windows-1252 fr 0.07
7 windows-1255 he 0.03