我的数据框有几个date
列。
Index Measurement Date Measure.1 Date.1 Measure.2 Date.2 Measure.3 Date.3
1 1 56.0 2018-03-16 2 2018-03-23 12 2018-03-29 22.0 2018-04-05
2 2 56.0 2018-03-16 78 2018-03-23 41234 2018-03-29 12.0 2018-04-05
3 12 65.0 <NA> 54 2018-03-23 35 <NA> 323.0 2018-04-05
4 15 129.1 2018-03-16 78 2018-03-23 12 2018-03-29 2.0 2018-04-05
5 22 56.0 2018-03-16 786 2018-03-23 234 2018-03-29 21.0 <NA>
6 567 NA 2018-03-16 34 2018-03-23 4 2018-03-29 545.0 2018-04-21
7 75 5.0 2018-03-16 52 2018-03-23 3 2018-03-29 5.0 2018-04-05
8 563 12.0 2018-03-16 43 2018-03-23 34 2018-03-29 5.0 2018-04-05
9 436 12.0 2018-03-16 3 2018-03-23 123 2018-03-29 213.0 2018-04-05
10 34533 56.0 2018-03-16 43 2018-03-23 32 2018-03-29 5.0 2018-04-25
11 234234 76.0 2018-03-16 234 2018-03-31 324 2018-05-06 5.0 2018-04-05
12 6643 76.0 2018-03-16 23 2018-03-23 123 2018-03-29 0.2 2018-04-11
以下是加载我的数据的代码(小样本):
structure(list(Index = c(1L, 2L, 12L, 15L, 22L, 567L, 75L, 563L,
436L, 34533L, 234234L, 6643L), Measurement = c(56, 56, 65, 129.1,
56, NA, 5, 12, 12, 56, 76, 76), Date = structure(c(17606, 17606,
NA, 17606, 17606, 17606, 17606, 17606, 17606, 17606, 17606, 17606
), class = "Date"), Measure.1 = c(2L, 78L, 54L, 78L, 786L, 34L,
52L, 43L, 3L, 43L, 234L, 23L), Date.1 = structure(c(17613, 17613,
17613, 17613, 17613, 17613, 17613, 17613, 17613, 17613, 17621,
17613), class = "Date"), Measure.2 = c(12L, 41234L, 35L, 12L,
234L, 4L, 3L, 34L, 123L, 32L, 324L, 123L), Date.2 = structure(c(17619,
17619, NA, 17619, 17619, 17619, 17619, 17619, 17619, 17619, 17657,
17619), class = "Date"), Measure.3 = c(22, 12, 323, 2, 21, 545,
5, 5, 213, 5, 5, 0.2), Date.3 = structure(c(17626, 17626, 17626,
17626, NA, 17642, 17626, 17626, 17626, 17646, 17626, 17632), class = "Date")), .Names = c("Index",
"Measurement", "Date", "Measure.1", "Date.1", "Measure.2", "Date.2",
"Measure.3", "Date.3"), row.names = c(NA, -12L), class = "data.frame")
我需要以行方式在相邻的 Date
列中查找,并且每个相邻日期单元格之间的差异应该不是超过9天且不少于3天。
我可以通过以下方式实现这一目标:
diffdate_table <- df[ , grep( "Date" , names( df ) ) ] %>% rowwise() %>% diff.Date
上面代码的输出将是:
> diffdate_table
Date.1 Date.2 Date.3
1 7 days 6 days 7 days
2 7 days 6 days 7 days
3 NA days NA days NA days
4 7 days 6 days 7 days
5 7 days 6 days NA days
6 7 days 6 days 23 days
7 7 days 6 days 7 days
8 7 days 6 days 7 days
9 7 days 6 days 7 days
10 7 days 6 days 27 days
11 15 days 36 days -31 days
12 7 days 6 days 13 days
问题
如何在diffdate_table中计算出至少有一个差异超过9天且少于3的行中的Index
(上述数据集中的一列)?
答案 0 :(得分:0)
有趣的问题,加上我之前从未见过var result = myDemoInfos.Select(demoInfo => new
{
DemoInfo = demoInfo,
// if the demoInfo has a non-null stat and a non-empty stat
// order it by ascending StatInfo.CreatedDate, and take the first
// otherwise use DateTime.MaxValue (Created in far future)
CreationDate = ( (demoInfo.stat != null) && (demoInfo.stat.Any()) ?
demoInfo.Stat
.Select(statInfo => createdDate)
.OrderBy(createdDate => createdDate)
.First() : // you know there is a first, you just checked Any()
DateTime.MaxValue, // if there is no First, take far future
})
.OrderBy(item => item.CreationDate)
.Select(item => item.DemoInfo);
。这是TickCount = ( (demoInfo.stat != null) && (demoInfo.stat.Any()) ?
demoInfo.stat.Select(statInfo => statInfo.createdDate.TickCount).Min() :
DateTime.MaxValue.TickCount,
两种diff.Date
方法。两者都得到相同的结果,只是你想要处理长形或宽形的数据来做差异。
第一个版本遵循您设置的方式,但我必须做一些奇怪的步骤以确保索引没有被删除。可能有更好的方法来做到这一点。
第二个dplyr
从一开始就变成一个长形状,使用gather
,只是简单地减去日期。
然后两者都按索引进行分组,并根据您的条件计算差异数。希望有所帮助!
lag
由reprex package(v0.2.0)创建于2018-04-25。
编辑:刚刚意识到您可能对每个索引在您的范围内具有差异的次数的计数感兴趣,仅在符合该条件的索引中。在这种情况下,您可以在library(tidyverse)
diffs1 <- df %>%
column_to_rownames("Index") %>%
select_at(vars(starts_with("Date"))) %>%
diff.Date() %>%
rownames_to_column("Index") %>%
mutate(Index = as.integer(Index)) %>%
gather(key = date, value = diff, -Index) %>%
filter(diff %>% between(3, 9)) %>%
count(Index) %>%
ungroup() %>%
arrange(Index)
diffs1
#> # A tibble: 10 x 2
#> Index n
#> <int> <int>
#> 1 1 3
#> 2 2 3
#> 3 15 3
#> 4 22 2
#> 5 75 3
#> 6 436 3
#> 7 563 3
#> 8 567 2
#> 9 6643 2
#> 10 34533 2
diffs1$Index
#> [1] 1 2 15 22 75 436 563 567 6643 34533
diffs2 <- df %>%
select_at(vars(Index, starts_with("Date"))) %>%
gather(key = obs, value = date, -Index) %>%
group_by(Index) %>%
mutate(prev_date = lag(date)) %>%
mutate(diff = date - prev_date) %>%
filter(!is.na(diff)) %>%
filter(diff %>% between(3, 9)) %>%
summarise(n = n())
之后停止并使用filter(diff %>% between(3, 9))
获取唯一索引。