我有四列:
A B C D(column d equal column b divided by column c)
apple(01) 1 6 1/6
apple(78) 2 7 2/7
apple(3) 3 8 3/8
banana(12) 4 9 4/9
banana 5 10 5/10
这是我想要做的。我想首先忽略括号。其次,我想总结重复的元素' B和C.第三,我想提出新的D。
A B C D(column d equal column b divided by column c)
apple 1+2+3 6+7+8 (1+2+3)/(6+7+8)
banana 4+5 9+10 (4+5)/(9+10)
我怎样才能在R中这样做?
答案 0 :(得分:0)
我们可以使用sub
删除“A”中的括号,将其用作分组变量,然后使用summarise_each
,我们得到“{B”和“C”的sum
并通过将'B'除以'C'来创建'D'列。
library(dplyr)
df %>%
group_by(A = sub("\\(.*", "", A)) %>%
summarise_each(funs(sum), B:C) %>%
mutate(D = B/C)
# A B C D
# <chr> <int> <int> <dbl>
#1 apple 6 21 0.2857143
#2 banana 9 19 0.4736842
或者我们可以将base R
方法与aggregate
transform(aggregate(.~A, transform(df, A = sub("\\(.*", "", A))[-4], sum), D = B/C)
# A B C D
#1 apple 6 21 0.2857143
#2 banana 9 19 0.4736842