请原谅非常新手的问题,但我正在尝试在包含基于其他列的百分比的数据框中创建新列。例如,我正在使用的数据类似于以下内容,其中该列是二进制因子(即存在或不存在“那个”),动词列是单个动词(即动词可能会也可能不会跟随“那个”),Freq列表示每个动词的频率。
That Verb Freq
1 That believe 3
2 NoThat think 4
3 That say 3
4 That believe 3
5 That think 4
6 NoThat say 3
7 NoThat believe 3
8 NoThat think 4
9 That say 3
10 NoThat think 4
我想要的是添加另一列,为每个不同的动词提供“that”表达式的总体速率(编码为“That”)。如下所示:
That Verb Freq Perc.That
1 That believe 3 33.3
2 NoThat think 4 25.0
3 That say 3 33.3
4 That believe 3 33.3
5 That think 4 25.0
6 NoThat say 3 33.3
7 NoThat believe 3 33.3
8 NoThat think 4 25.0
9 That say 3 33.3
10 NoThat think 4 25.0
可能我在其他地方错过了类似的问题。如果是的话,我道歉。不过,请提前感谢您的帮助。
答案 0 :(得分:1)
您想使用ddply
库中的plyr
功能:
#install.packages('plyr')
library(plyr)
dat # your data frame
ddply(dat, .(verb), transform, perc.that = freq/sum(freq))
# that verb freq perc.that
#1 That believe 3 0.3333333
#2 That believe 3 0.3333333
#3 NoThat believe 3 0.3333333
#4 That say 3 0.3333333
#...