替换向量中的值,要替换的项目数不是替换长度的倍数

时间:2015-03-14 00:10:09

标签: r date vector

我在替换数据框中的值时遇到了一些奇怪的问题。我想将字符串转换为日期格式。我有两种方式,因为有两种格式的数据。

library(rvest)
library(stringi)

urlOnetWybory <- "http://wiadomosci.onet.pl/wybory-prezydenckie/xcnpc"
htmlOnetWybory <- html(urlOnetWybory)
nodes <- html_nodes(htmlOnetWybory, ".datePublished , .itemTitle")
text <- html_text(nodes)

dataRaw <- text[seq(1, length(text), 2)]

#"dzisiaj" = "today", "wczoraj"="yesterday"
data <- sapply(dataRaw, function(x){
  #converting string of the first type to data 
   stri_replace_all_fixed(x, "dzisiaj", as.character(Sys.Date()))
   stri_replace_all_fixed(x, "wczoraj", as.character(Sys.Date() - 1))
})

#indexes in dataRaw where there's a word "dzisiaj" or "wczoraj"
indeksDzis <- stri_detect_regex(dataRaw, "dzisiaj [0-9]{2}:[0-9]{2}")
indeksWczo <- stri_detect_regex(dataRaw, "wczoraj [0-9]{2}:[0-9]{2}")
#indexes for cells where date is in the second format.
indDoZmiany <- !(indeksDzis | indeksWczo)

#I get message here. Why? The length is the same.
data[indDoZmiany] <- strptime(data[indDoZmiany], "%d %b, %H:%M")

任何人都知道如何修复它?为什么我会得到一些清单?

1 个答案:

答案 0 :(得分:0)

POSIXlt有一些陷阱,因为它是一个列表。试试这个最小的例子:

x <- strptime(c("15:40 12 mar", "12:58 11 mar"), "%d %b, %H:%M")
unclass(x)

将最后一行替换为:

data[indDoZmiany] <- as.POSIXct( strptime(data[indDoZmiany], "%d %b, %H:%M"))