我有data.frame
我要删除连续几天的行。例如,我有以下data.frame
(头),其名称为sell_tv
,我想删除具有连续日期的行。在这种特殊情况下,我想删除第5行,如第5行和第5行。 6有连续的日期。
Date Open High Low Close Sell.TV Buy.TV
1 2015-04-08 2207 2204 2165 2166 4.038113 3.083603
2 2015-03-16 2214 2215 2172 2198 4.041986 3.087017
3 2015-03-05 2343 2364 2320 2324 4.023689 3.081034
4 2015-01-27 2171 2182 2151 2178 4.021998 3.070200
5 2015-01-23 2234 2244 2222 2230 4.032086 3.061206
6 2015-01-22 2278 2282 2242 2246 4.037248 3.095450
我为此编写了以下代码,但获得了:
****"Error in if (sell_tv$Date[i] == sell_tv$Date[i + 1] + 1) { : missing value where TRUE/FALSE needed"****
代码:
for( i in 1:nrow(sell_tv))
{
if (sell_tv$Date[i] == sell_tv$Date[i+1] + 1 )
{
new_sell<- sell_tv[-i,]
}
else
{
new_sell<- sell_tv[,]
}
i= i+1
}
感谢任何帮助!
答案 0 :(得分:1)
正如我在评论中所说,您可以保留循环并保存应在变量中删除的行数,或者您可以尝试一次获取行号:
to_delete <- which(sell_tv$Date[-nrow(sell_tv)]==sell_tv$Date[-1]+1) #5
new_sell <- sell_tv[-to_delete, ]
new_sell
# Date Open High Low Close Sell.TV Buy.TV
# 1 2015-04-08 2207 2204 2165 2166 4.038113 3.083603
# 2 2015-03-16 2214 2215 2172 2198 4.041986 3.087017
# 3 2015-03-05 2343 2364 2320 2324 4.023689 3.081034
# 4 2015-01-27 2171 2182 2151 2178 4.021998 3.070200
# 6 2015-01-22 2278 2282 2242 2246 4.037248 3.095450
数据强>
sell_tv <- structure(list(Date = structure(c(16533, 16510, 16499, 16462,
16458, 16457), class = "Date"), Open = c(2207L, 2214L, 2343L,
2171L, 2234L, 2278L), High = c(2204L, 2215L, 2364L, 2182L, 2244L,
2282L), Low = c(2165L, 2172L, 2320L, 2151L, 2222L, 2242L), Close = c(2166L,
2198L, 2324L, 2178L, 2230L, 2246L), Sell.TV = c(4.038113, 4.041986,
4.023689, 4.021998, 4.032086, 4.037248), Buy.TV = c(3.083603,
3.087017, 3.081034, 3.0702, 3.061206, 3.09545)), .Names = c("Date",
"Open", "High", "Low", "Close", "Sell.TV", "Buy.TV"), row.names = c("1",
"2", "3", "4", "5", "6"), class = "data.frame")
答案 1 :(得分:0)
此解决方案可用于sell_tv数据框的日期列中的唯一日期和重复日期
sell_tv = read.table("myfile.txt", sep = "\t", header = TRUE, stringsAsFactors = FALSE)
print(sell_tv)
# Date Open High Low Close Sell.TV Buy.TV
# 1 2015-04-08 2207 2204 2165 2166 4.038113 3.083603
# 2 2015-03-16 2214 2215 2172 2198 4.041986 3.087017
# 3 2015-03-05 2343 2364 2320 2324 4.023689 3.081034
# 4 2015-01-27 2171 2182 2151 2178 4.021998 3.070200
# 5 2015-01-23 2234 2244 2222 2230 4.032086 3.061206
# 6 2015-01-22 2278 2282 2242 2246 4.037248 3.095450
#add duplicate date
sell_tv[3,1] = "2015-01-23"
print(sell_tv)
# Date Open High Low Close Sell.TV Buy.TV
# 1 2015-04-08 2207 2204 2165 2166 4.038113 3.083603
# 2 2015-03-16 2214 2215 2172 2198 4.041986 3.087017
# 3 2015-01-23 2343 2364 2320 2324 4.023689 3.081034
# 4 2015-01-27 2171 2182 2151 2178 4.021998 3.070200
# 5 2015-01-23 2234 2244 2222 2230 4.032086 3.061206
# 6 2015-01-22 2278 2282 2242 2246 4.037248 3.095450
date_str = sell_tv$Date
to_delete = c()
for(i in date_str){
a1 = which(unlist(lapply(date_str, function(x) as.numeric(difftime(x, i))))== 1)
if(length(a1) > 0){
to_delete = c(to_delete, a1)
} else
next
}
sell_tv = sell_tv[-to_delete,]
输出:
print(sell_tv)
Date Open High Low Close Sell.TV Buy.TV
1 2015-04-08 2207 2204 2165 2166 4.038113 3.083603
2 2015-03-16 2214 2215 2172 2198 4.041986 3.087017
4 2015-01-27 2171 2182 2151 2178 4.021998 3.070200
6 2015-01-22 2278 2282 2242 2246 4.037248 3.095450
答案 2 :(得分:0)
在日期使用带有diff运算符的逻辑索引:
sell_tv[ c(9999,diff(sell_tv$Date)) != -1, ]
我们只是将一些sentinel值添加到diff(...)
如果你想排除&#39;之前或之后的那一天,那么布尔值 - 不是%运算符中的%:
sell_tv[ ! (c(9999,diff(sell_tv$Date)) %in% c(-1,+1)), ]