R-如何从数据帧中检查已修改和未修改的值

时间:2017-09-13 05:32:48

标签: r dataframe

如何通过分组从数据框中获取修改和未修改的行。

数据框。

U_ID    process  value1  value2

 1     Fetch      A        A
 2     Review     C       C
 1     Review     A        H
 1     Fetch      B        C
 2     Review     NA       F
 3     Fetch      A        D
 4     Fetch      R        J
 4     Review     H        J

下面的数据框通过对U_ID,PROCESS列进行分组来显示上一行值的样本。

U_ID    process  value1  value2   value1modified  value2modified      

 1     Fetch      A        A         0                 0
 1     Fetch      B        C         1                 1
 1     Review     A        H         0                 0
 2     Review     C        C         0                 0
 2     Review     NA       F         1                 1
 3     Fetch      A        D         0                 0
 4     Fetch      R        J         0                 0
 4     Review     H        J         0                 0

我预期的数据框架。

 U_ID    process     value1modcount  value1unmodcount  value2modcount   value2unmodcount

 1        Fetch        1                  1                      1                  1
 1        Review       0                  1                      0                  1
 2        Review       1                  1                      1                  1 
 3        Fetch        0                  1                      0                  1
 4        Fetch        0                  1                      0                  1
 4        Review       0                  1                      0                  1

DATA

structure(list(U_ID = c(1, 2, 1, 1, 2, 3, 4, 4), process = c("Fetch", 
"Review", "Review", "Fetch", "Review", "Fetch", "Fetch", "Review"
), value1 = c("A", "C", "A", "B", NA, "A", "R", "H"), value2 = c("A", 
"C", "H", "C", "F", "D", "J", "j")), .Names = c("U_ID", "process", 
"value1", "value2"), row.names = c(NA, -8L), class = "data.frame")

1 个答案:

答案 0 :(得分:0)

可以使用dplyr完成。

library(dplyr)

data <- structure(list(U_ID = c(1, 2, 1, 1, 2, 3, 4, 4), process = c("Fetch", 
"Review", "Review", "Fetch", "Review", "Fetch", "Fetch", "Review"
), value1 = c("A", "C", "A", "B", NA, "A", "R", "H"), value2 = c("A", 
"C", "H", "C", "F", "D", "J", "j")), .Names = c("U_ID", "process", 
"value1", "value2"), row.names = c(NA, -8L), class = "data.frame")

data %>%
  group_by(U_ID, process) %>%
  mutate(
  value1.next  = lag(value1),
  value2.next  = lag(value2),
  rn = row_number(),
  value1modified =  ifelse(rn == 1, 0,
                           ifelse((is.na(value1) + is.na(value1.next)) == 1, 1,
                                  ifelse(value1 != value1.next, 1,0))),
  value2modified =  ifelse(rn == 1, 0,
                           ifelse((is.na(value2) + is.na(value2.next)) == 1, 1,
                                  ifelse(value2 != value2.next, 1,0)))) %>%
  group_by(U_ID, process) %>%
  summarise(v1modcount = sum(ifelse(value1modified == 1, 1, 0)),
            v1unmodcount = sum(ifelse(value1modified == 0, 1, 0)),
            v2modcount = sum(ifelse(value2modified == 1, 1, 0)),
            v2unmodcount = sum(ifelse(value2modified == 0, 1, 0)))

<强>输出:

U_ID process v1modcount v1unmodcount v2modcount v2unmodcount
1   Fetch   1   1   1   1
1   Review  0   1   0   1
2   Review  1   1   1   1
3   Fetch   0   1   0   1
4   Fetch   0   1   0   1
4   Review  0   1   0   1