将条件前后的行插入data.frame

时间:2018-06-02 05:43:48

标签: r dataframe data.table

我有这样的数据:

df <- data.frame(V1=c("stuff", "2nd June 2018", "otherstuff1", "baseball","", "142", "otherstuff2", "football","", "150", "4th June 2018", "otherstuff99", "hockey","", "160", "otherstuff100", "baseball", "", "190", "otherstuff5", "lacrosse", "200", "9th June 2018"), stringsAsFactors = F)

我想按条件插入一行,新单元格“date”在任何日期值的书挡上。在日期之间有随机数量的其他细胞:

df.desired <- data.frame(V1=c("stuff","date", "2nd June 2018","date" ,"otherstuff1", "baseball","", "142", "otherstuff2", "football","", "150","date", "4th June 2018","date", "otherstuff99", "hockey","", "160", "otherstuff100", "baseball", "", "190", "otherstuff5", "lacrosse", "200", "date", "9th June 2018","date"), stringsAsFactors=F)                 

2 个答案:

答案 0 :(得分:3)

您需要执行三个步骤:

  • 查找日期位置(使用grep
  • 使用date
  • 的空间创建新的data.frame
  • date添加到新的data.frame

代码:

# Find position of `month year`
foo <- grep(paste(month.name, "\\d+$", collapse = "|"), df$V1)
# Expand original data.frame with space for data
dfDesired <- data.frame(x = df$V1[sort(c(1:nrow(df), foo, foo))], stringsAsFactors = FALSE)
# Find position for date in expanded data.frame
bar <- foo + seq(by = 2, length.out = length(foo))
# Add date
dfDesired$x[c(bar - 1, bar + 1)] <- "date"

注意:

grep完成字符串:paste(month.name, "\\d+$", collapse = "|")

  

“1月\ d + $ | 2月\ d + $ | 3月\ d + $ | 4月\ d + $ | 5月   \ d + $ |六月\ d + $ |七月\ d + $ |八月\ d + $ |九月\ d + $ |十月   \ d + $ |十一月\ d + $ |十二月\ d + $“

我们需要bar个位置,因为新data.frame中的行移动了:1,3,5,+

答案 1 :(得分:1)

我这样做;看起来dmy包中的lubridate函数能够成功识别示例中的所有日期格式,但是如果您有更多种类的日期字符串可能并不总是存在:

# lubridate parses your dates in dmy function
df$date_try <- dmy(df$V1) 
# the ones that are not NA must be dates
ind <- c(which(!is.na(df$date_try)))
# insert some bookends at the index locations before and after your dates
new_ind <- c(seq_along(df$date_try), ind + 0.5, ind - 0.5)
new_V1 <- c(df$V1, rep("date", length(ind) * 2))

# currently the bookends are at the end of the list,
# we must re-order them to insert at the proper locations
# create your desired output dataframe
df.new <- data.frame(V1 = new_V1[order(new_ind)])

> head(df.new)
             V1
1         stuff
2          date
3 2nd June 2018
4          date
5   otherstuff1
6      baseball