根据另外两列,对一列具有相同键的行进行求和,并重命名其他值

时间:2018-05-14 09:04:41

标签: dataframe

有没有办法重命名该案例值不同的列行的所有值?

示例:

   #Data
   Key   Date   Value       Name      Type  Year
    C    2000-04   0.55     name1     x1    2000   <- 
    C    2000-04   0.60     name2     x2    2000   <-
    C    2000-05   1.2      Name 4    x4    2000
    A    2001-06   4        Name 2    x6    2001
    A    2001-07   5        Name 3    x1    2001
    A    2002-08   2        Name 1    x2    2002
    ...

> df1
  Key   Date    Value   Name     Type   Year
 1 C     2000-04  1.15  SUM      SUM   2000
 2 C     2000-05  1.2   Name 4    x4    2000       
 3 A     2001-06  4     Name 2    x6    2001  
 4 A     2001-07  5     Name 3    x1    2001   
 5 A     2002-08  2     Name 1    x2    2002

所以我希望我的列年保持值2000,因为它在两行中都是相同的。对于列类型和名称,我想标记它们的值已更改。

我试图修改上一个问题中使用的代码,但我的R-Skills还不够好,我想。

谢谢:)

1 个答案:

答案 0 :(得分:0)

这个怎么样?

library(dplyr)

df1 <- df %>%
  group_by(Key, Date) %>%
  mutate(Value = sum(Value),
         Name  = ifelse(n() > 1, 'SUM', Name),      #identify rows where 'SUM' is applied and replace Name & type column with 'SUM'
         Type  = ifelse(n() > 1, 'SUM', Type)) %>%  
  filter(row_number() == 1)
df1

输出为:

  Key   Date    Value Name  Type   Year
1 C     2000-04  1.15 SUM   SUM    2000
2 C     2000-05  1.20 Name4 x4     2000
3 A     2001-06  4.00 Name2 x6     2001
4 A     2001-07  5.00 Name3 x1     2001
5 A     2002-08  2.00 Name1 x2     2002

示例数据:

df <- structure(list(Key = c("C", "C", "C", "A", "A", "A"), Date = c("2000-04", 
"2000-04", "2000-05", "2001-06", "2001-07", "2002-08"), Value = c(0.55, 
0.6, 1.2, 4, 5, 2), Name = c("name1", "name2", "Name4", "Name2", 
"Name3", "Name1"), Type = c("x1", "x2", "x4", "x6", "x1", "x2"
), Year = c(2000L, 2000L, 2000L, 2001L, 2001L, 2002L)), .Names = c("Key", 
"Date", "Value", "Name", "Type", "Year"), class = "data.frame", row.names = c(NA, 
-6L))