Question

我目前正在处理一组时间序列数据。数据框有4列（日期，值，月份和大小）。月份列仅指示日期所在的月份。我希望能够在每个月的前三天内在第一个实例中记录该值超过0.5并且下一次（在结束前两天）月）该值在每个月内恢复为负数。

例如：

text-align

如果在前三天之后第一次发生值> 0.5，则忽略并继续下个月。如果该值在月末之前的2天之前永远不会回到负值，则只记录最后一个值

每个月完成的操作有点复杂，所以我认为for循环是不可避免的。任何建议都会非常感激。

谢谢！

Answer 1

获得所需的第一个结果的一种方法是将dplyr与自定义函数一起使用：

first.over：检测一个月前三天内是否有Value > 0.5。
reversion：检测下一个日期（在first.over检测到的日期之后，如果存在），在Value月末的两个多天之前检测到恢复为负数。

数据清理链是group_by Month：

mutate使用first.over功能Month创建over_0.5个月前三天的Value新列0.5 first.over }超过NA。如果未检测到此日期，over_0.5将返回NA（并且first.over将为windowing）。请注意，over_0.5用作first.over函数，因此Value将是reversion内逻辑返回的相同值（日期）的向量。这是为了在我们稍后致电filter时保留over_0.5列中的数据。
使用NA删除summarise为over_0.5的月份。无视这几个月，这就完成了你想要的东西。
使用first汇总结果。
- 只需选择Month值，然后通过reversion调出此列具有相同的值，来汇总reversion_date列。
- 使用Month功能按first.over <- function(v, d) { # get index to first date for which the Value > 0.5 # this will be NA if there is no date where Value > 0.5 i <- first(which(v > 0.5)) # if that date is in the first three days, return the date # otherwise return NA if (!is.na(i) && i < 4) { return(d[i]) } else { return(NA) } } reversion <- function(v,fo,d) { # if there is no first over 0.5 date, return NA if (any(is.na(fo)) || length(fo) == 0) return (NA) # get indices i for all negative Values i <- which(v < 0.0) # get the first index j from i for which the date[i] # is greater than the first over 0.5 date. Again, # this will be NA if there are no negative Values # or if there are no dates with negative values that # are greater than the first over 0.5 date. j <- i[first(which(d[i] > fo[1]))] # if that date is two or more days before the last day # of the month, return that date; otherwise, return # the last day. if (!is.na(j) && j < (length(v) - 1)) { return(d[j]) } else { return(d[length(v)]) } } result <- df %>% group_by(Month) %>% mutate(over_0.5 = first.over(Value,Dates)) %>% filter(!is.na(over_0.5)) %>% summarise(first(over_0.5),reversion_date = reversion(Value,over_0.5,Dates))计算Size。

代码如下：

first.over

此代码仅生成您请求的第一种输出类型。要生成另一个，您需要定义reversion列的数据。

注意：

Dates和first.over都假设i列按日期（按升序排序）按时间序列排序。
此外，reversion假设每个日期都有一行。但是，通过比较日期中的日期而不是行索引reversion，可以在月份中缺少数据（行）的情况下轻松修改逻辑。
虽然Date可以处理当月中缺失的数据，但必须提供该月最后一天的数据才能满足您的要求。否则，Month将返回Month的最后print(result) ### A tibble: 3 x 3 ## Month first(over_0.5) reversion_date ## <int> <date> <date> ##1 1 2016-01-02 2016-01-05。

使用您发布的数据（根据您的规范增加df <- structure(list(Dates = structure(c(16801, 16802, 16803, 16804, 16805, 16806, 16807, 16808, 16809, 16810, 16811, 16812, 16813, 16814, 16815, 16816, 16831, 16832, 16833, 16834, 16835, 16860, 16861, 16862, 16863, 16864, 16865, 16866, 16867, 16868, 16869, 16870, 16871, 16872, 16873, 16874, 16875, 16876, 16877, 16878, 16879, 16880, 16881, 16882, 16883, 16884, 16885, 16886, 16887, 16888, 16889, 16890, 16891, 16892, 16893, 16894, 16895, 16921, 16922, 16923, 16924, 16925, 16951, 16952), class = "Date"), Value = c(0.360588739, 0.595765265, 0.448855962, 0.295765265, -0.24470058, -0.169958947, -0.216953024, -0.287801531, -0.328458361, -0.468009532, -0.368107924, -0.500611564, -0.506701117, -0.564366906, -0.737858078, -0.764897486, -0.864897486, -0.764897486, -0.764897486, -0.764897486, -0.764897486, -0.764897486, -0.360588739, -0.460588739, 0.564897486, 0.664897486, 0.664897486, 0.664897486, 0.664897486, 0.664897486, 0.664897486, 0.664897486, 0.664897486, 0.664897486, 0.664897486, 0.664897486, 0.664897486, 0.664897486, 0.664897486, 0.664897486, 0.664897486, 0.664897486, 0.664897486, 0.664897486, 0.664897486, 0.664897486, 0.664897486, 0.664897486, 0.664897486, 0.664897486, -0.664897486, -0.664897486, -0.664897486, 0.764897486, 0.764897486, 0.764897486, 0.764897486, 0.764897486, 0.264897486, 0.264897486, 0.264897486, 0.264897486, -0.264897486, -0.264897486), Month = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L)), .Names = c("Dates", "Value", "Month"), row.names = c(NA, -64L), class = "data.frame")列）的结果如下：

print(result)
### A tibble: 3 x 3
##  Month first(over_0.5) reversion_date
##  <int>          <date>         <date>
##1     1      2016-01-02     2016-01-05
##2     3      2016-03-03     2016-03-29
##3     4      2016-04-01     2016-04-30

使用更大的数据集（距离完整测试还很远）：

vitalsign = {'heart' : heart_rate,
             'blood' : blood_pressure_systolic,
             'temp' : temperature,
             'pulse' : pulse_oximetry}
df = pd.DataFrame(vitalsign)

结果是：

df.to_csv('your_file_name.csv')

希望这有帮助。

R：dplyr group_by。循环遍历每个组

1 个答案: