我的数据框架如下:
Count_ID Stats Date
123 A 10-01-2017
123 A 12-01-2017
123 B 15-01-2017
456 B 18-01-2017
456 C 17-01-2017
789 A 20-01-2017
486 A 25-01-2017
486 A 28-01-2017
我想添加状态&在Dataframe中对Count列进行计数,该列给出了以下提及状态。
Count_ID
的日期匹配最早的Stats
为" A"比较任何具有相同值的Count_ID
(即123)是否具有日期>比以前相同的Count_ID
Stats
为" A",而不是显示" False"在状态栏中。Count_ID
具有相同的值(即123)而不是检查Stats
" A"与{" A"以外的Count_ID
相匹配或" A"有日期>而不是那些Stats
" A",而不是显示状态为" False" Stats
(即123)Count_ID
为&#34; A&#34;日期差异<30天(按照日期的前一个Stats
显示)显示状态为&#34; False-B&#34;。Count_ID
创建的同一Count_ID
之间的天数差异。必需输出:
Count_ID
Dput:
Count_ID Stats Date Status Count
123 A 10-01-2017 False-B 0
123 A 12-01-2017 False-B 2
123 B 15-01-2017 False 3
456 B 18-01-2017 - 0
456 C 17-01-2017 False 1
789 A 20-01-2017 - 0
486 A 25-01-2017 False-B 0
486 A 28-01-2017 False-B 3
答案 0 :(得分:0)
如果我正确理解了这个问题,那么你可以试试这个
library(dplyr)
df %>%
group_by(Count_ID) %>%
mutate(Count = c(0, abs(as.numeric(diff(Date)))),
Status = ifelse((Date==min(Date[Stats=='A']) | Date>min(Date[Stats=='A'])) & (n()>1), "FALSE", "-")) %>%
mutate(Status = ifelse(Stats=='A' & Count < 30 & Status=='FALSE', 'FALSE-B', Status)) %>%
data.frame()
请注意,“行项目5”的条件不明确,因此我将其保留为-
。由于Stats = A
没有Count_ID = 456
,我不确定您要如何处理这一行。
输出为:
Count_ID Stats Date Count Status
1 123 A 2017-01-10 0 FALSE-B
2 123 A 2017-01-12 2 FALSE-B
3 123 B 2017-01-15 3 FALSE
4 456 B 2017-01-18 0 -
5 456 C 2017-01-17 1 -
6 789 A 2017-01-20 0 -
7 486 A 2017-01-25 0 FALSE-B
8 486 A 2017-01-28 3 FALSE-B
示例数据:
df <- structure(list(Count_ID = c(123L, 123L, 123L, 456L, 456L, 789L,
486L, 486L), Stats = c("A", "A", "B", "B", "C", "A", "A", "A"
), Date = structure(c(17176, 17178, 17181, 17184, 17183, 17186,
17191, 17194), class = "Date")), .Names = c("Count_ID", "Stats",
"Date"), row.names = c(NA, -8L), class = "data.frame")