我在替换数据框中的值时遇到了一些奇怪的问题。我想将字符串转换为日期格式。我有两种方式,因为有两种格式的数据。
library(rvest)
library(stringi)
urlOnetWybory <- "http://wiadomosci.onet.pl/wybory-prezydenckie/xcnpc"
htmlOnetWybory <- html(urlOnetWybory)
nodes <- html_nodes(htmlOnetWybory, ".datePublished , .itemTitle")
text <- html_text(nodes)
dataRaw <- text[seq(1, length(text), 2)]
#"dzisiaj" = "today", "wczoraj"="yesterday"
data <- sapply(dataRaw, function(x){
#converting string of the first type to data
stri_replace_all_fixed(x, "dzisiaj", as.character(Sys.Date()))
stri_replace_all_fixed(x, "wczoraj", as.character(Sys.Date() - 1))
})
#indexes in dataRaw where there's a word "dzisiaj" or "wczoraj"
indeksDzis <- stri_detect_regex(dataRaw, "dzisiaj [0-9]{2}:[0-9]{2}")
indeksWczo <- stri_detect_regex(dataRaw, "wczoraj [0-9]{2}:[0-9]{2}")
#indexes for cells where date is in the second format.
indDoZmiany <- !(indeksDzis | indeksWczo)
#I get message here. Why? The length is the same.
data[indDoZmiany] <- strptime(data[indDoZmiany], "%d %b, %H:%M")
任何人都知道如何修复它?为什么我会得到一些清单?
答案 0 :(得分:0)
POSIXlt有一些陷阱,因为它是一个列表。试试这个最小的例子:
x <- strptime(c("15:40 12 mar", "12:58 11 mar"), "%d %b, %H:%M")
unclass(x)
将最后一行替换为:
data[indDoZmiany] <- as.POSIXct( strptime(data[indDoZmiany], "%d %b, %H:%M"))