Question

我在替换数据框中的值时遇到了一些奇怪的问题。我想将字符串转换为日期格式。我有两种方式，因为有两种格式的数据。

library(rvest)
library(stringi)

urlOnetWybory <- "http://wiadomosci.onet.pl/wybory-prezydenckie/xcnpc"
htmlOnetWybory <- html(urlOnetWybory)
nodes <- html_nodes(htmlOnetWybory, ".datePublished , .itemTitle")
text <- html_text(nodes)

dataRaw <- text[seq(1, length(text), 2)]

#"dzisiaj" = "today", "wczoraj"="yesterday"
data <- sapply(dataRaw, function(x){
  #converting string of the first type to data 
   stri_replace_all_fixed(x, "dzisiaj", as.character(Sys.Date()))
   stri_replace_all_fixed(x, "wczoraj", as.character(Sys.Date() - 1))
})

#indexes in dataRaw where there's a word "dzisiaj" or "wczoraj"
indeksDzis <- stri_detect_regex(dataRaw, "dzisiaj [0-9]{2}:[0-9]{2}")
indeksWczo <- stri_detect_regex(dataRaw, "wczoraj [0-9]{2}:[0-9]{2}")
#indexes for cells where date is in the second format.
indDoZmiany <- !(indeksDzis | indeksWczo)

#I get message here. Why? The length is the same.
data[indDoZmiany] <- strptime(data[indDoZmiany], "%d %b, %H:%M")

任何人都知道如何修复它？为什么我会得到一些清单？

Answer 1

POSIXlt有一些陷阱，因为它是一个列表。试试这个最小的例子：

x <- strptime(c("15:40 12 mar", "12:58 11 mar"), "%d %b, %H:%M")
unclass(x)

将最后一行替换为：

data[indDoZmiany] <- as.POSIXct( strptime(data[indDoZmiany], "%d %b, %H:%M"))

替换向量中的值，要替换的项目数不是替换长度的倍数

1 个答案: