我目前正在处理一组时间序列数据。数据框有4列(日期,值,月份和大小)。月份列仅指示日期所在的月份。我希望能够在每个月的前三天内在第一个实例中记录该值超过0.5并且下一次(在结束前两天)月)该值在每个月内恢复为负数。
例如:
text-align
如果在前三天之后第一次发生值> 0.5,则忽略并继续下个月。 如果该值在月末之前的2天之前永远不会回到负值,则只记录最后一个值
每个月完成的操作有点复杂,所以我认为for循环是不可避免的。任何建议都会非常感激。
谢谢!
答案 0 :(得分:1)
获得所需的第一个结果的一种方法是将dplyr
与自定义函数一起使用:
first.over
:检测一个月前三天内是否有Value > 0.5
。reversion
:检测下一个日期(在first.over
检测到的日期之后,如果存在),在Value
月末的两个多天之前检测到恢复为负数。数据清理链是group_by
Month
:
mutate
使用first.over
功能Month
创建over_0.5
个月前三天的Value
新列0.5
first.over
}超过NA
。如果未检测到此日期,over_0.5
将返回NA
(并且first.over
将为windowing
)。请注意,over_0.5
用作first.over
函数,因此Value
将是reversion
内逻辑返回的相同值(日期)的向量。这是为了在我们稍后致电filter
时保留over_0.5
列中的数据。NA
删除summarise
为over_0.5
的月份。无视这几个月,这就完成了你想要的东西。first
汇总结果。
Month
值,然后通过reversion
调出此列具有相同的值,来汇总reversion_date
列。Month
功能按first.over <- function(v, d) {
# get index to first date for which the Value > 0.5
# this will be NA if there is no date where Value > 0.5
i <- first(which(v > 0.5))
# if that date is in the first three days, return the date
# otherwise return NA
if (!is.na(i) && i < 4) {
return(d[i])
} else {
return(NA)
}
}
reversion <- function(v,fo,d) {
# if there is no first over 0.5 date, return NA
if (any(is.na(fo)) || length(fo) == 0) return (NA)
# get indices i for all negative Values
i <- which(v < 0.0)
# get the first index j from i for which the date[i]
# is greater than the first over 0.5 date. Again,
# this will be NA if there are no negative Values
# or if there are no dates with negative values that
# are greater than the first over 0.5 date.
j <- i[first(which(d[i] > fo[1]))]
# if that date is two or more days before the last day
# of the month, return that date; otherwise, return
# the last day.
if (!is.na(j) && j < (length(v) - 1)) {
return(d[j])
} else {
return(d[length(v)])
}
}
result <- df %>% group_by(Month)
%>% mutate(over_0.5 = first.over(Value,Dates))
%>% filter(!is.na(over_0.5))
%>% summarise(first(over_0.5),reversion_date = reversion(Value,over_0.5,Dates))
计算Size
。代码如下:
first.over
此代码仅生成您请求的第一种输出类型。要生成另一个,您需要定义reversion
列的数据。
注意:
Dates
和first.over
都假设i
列按日期(按升序排序)按时间序列排序。reversion
假设每个日期都有一行。但是,通过比较日期中的日期而不是行索引reversion
,可以在月份中缺少数据(行)的情况下轻松修改逻辑。Date
可以处理当月中缺失的数据,但必须提供该月最后一天的数据才能满足您的要求。否则,Month
将返回Month
的最后print(result)
### A tibble: 3 x 3
## Month first(over_0.5) reversion_date
## <int> <date> <date>
##1 1 2016-01-02 2016-01-05
。使用您发布的数据(根据您的规范增加df <- structure(list(Dates = structure(c(16801, 16802, 16803, 16804,
16805, 16806, 16807, 16808, 16809, 16810, 16811, 16812, 16813,
16814, 16815, 16816, 16831, 16832, 16833, 16834, 16835, 16860,
16861, 16862, 16863, 16864, 16865, 16866, 16867, 16868, 16869,
16870, 16871, 16872, 16873, 16874, 16875, 16876, 16877, 16878,
16879, 16880, 16881, 16882, 16883, 16884, 16885, 16886, 16887,
16888, 16889, 16890, 16891, 16892, 16893, 16894, 16895, 16921,
16922, 16923, 16924, 16925, 16951, 16952), class = "Date"), Value = c(0.360588739,
0.595765265, 0.448855962, 0.295765265, -0.24470058, -0.169958947,
-0.216953024, -0.287801531, -0.328458361, -0.468009532, -0.368107924,
-0.500611564, -0.506701117, -0.564366906, -0.737858078, -0.764897486,
-0.864897486, -0.764897486, -0.764897486, -0.764897486, -0.764897486,
-0.764897486, -0.360588739, -0.460588739, 0.564897486, 0.664897486,
0.664897486, 0.664897486, 0.664897486, 0.664897486, 0.664897486,
0.664897486, 0.664897486, 0.664897486, 0.664897486, 0.664897486,
0.664897486, 0.664897486, 0.664897486, 0.664897486, 0.664897486,
0.664897486, 0.664897486, 0.664897486, 0.664897486, 0.664897486,
0.664897486, 0.664897486, 0.664897486, 0.664897486, -0.664897486,
-0.664897486, -0.664897486, 0.764897486, 0.764897486, 0.764897486,
0.764897486, 0.764897486, 0.264897486, 0.264897486, 0.264897486,
0.264897486, -0.264897486, -0.264897486), Month = c(1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L)), .Names = c("Dates",
"Value", "Month"), row.names = c(NA, -64L), class = "data.frame")
列)的结果如下:
print(result)
### A tibble: 3 x 3
## Month first(over_0.5) reversion_date
## <int> <date> <date>
##1 1 2016-01-02 2016-01-05
##2 3 2016-03-03 2016-03-29
##3 4 2016-04-01 2016-04-30
使用更大的数据集(距离完整测试还很远):
vitalsign = {'heart' : heart_rate,
'blood' : blood_pressure_systolic,
'temp' : temperature,
'pulse' : pulse_oximetry}
df = pd.DataFrame(vitalsign)
结果是:
df.to_csv('your_file_name.csv')
希望这有帮助。