使用R的plyr
库,我可以得到存储在R数据框中的同一变量的不同测量值的平均值,如下所示:
library(plyr)
dataAvg <- ddply(data, .(VOWEL_QUALITIES), summarise, PITCH = mean(PITCH))
例如,数据框是这样的:
VOWEL_QUALITIES <- c(rep("a",3),rep("i",3))
TOKEN <- c("Measurement 1", "Measurement 2", "Measurement 3", "Measurement 1", "Measurement 2", "Measurement 3")
PITCH <- c(10, 11, 12, 15, 16, 17)
data <- data.frame(VOWEL_QUALITIES, PITCH, TOKEN)
获得这些平均值后,我可以在“dataAvg”数据框中添加“TOKEN”列,并将rbind()
添加回“数据”数据框,例如,如果我想绘制音高每个测量的每个元音除了它的平均值:
dataAvg$TOKEN <- c(rep("Average",7))
data <- rbind(data,dataAvg)
是否有一种更有效的方法,我不必手动将额外的列添加到具有平均值的数据框中,然后手动rbind()
将其返回到主数据框?
答案 0 :(得分:4)
您可以使用data.table的:=
将其内联:
require(data.table)
data = data.table(data)
data[,AVG:=mean(PITCH),by="VOWEL_QUALITIES"]
然后数据是:
VOWEL_QUALITIES PITCH TOKEN AVG
1: a 10 Measurement 1 11
2: a 11 Measurement 2 11
3: a 12 Measurement 3 11
4: i 15 Measurement 1 16
5: i 16 Measurement 2 16
6: i 17 Measurement 3 16
更容易绘制/操纵
答案 1 :(得分:2)
只需添加,这是dplyr
+ ggplot2
解决方案
library(dplyr)
data2 = data %.%
group_by(VOWEL_QUALITIES) %.%
mutate(AVG = mean(PITCH))
library(ggplot2)
qplot(VOWEL_QUALITIES, PITCH, data = data2) +
geom_point(aes(y = AVG), color = 'red')
答案 2 :(得分:1)
这一步是什么?
rbind(
data,
ddply(data, .(VOWEL_QUALITIES), summarise, PITCH = mean(PITCH), TOKEN="Average")
)
结果:
VOWEL_QUALITIES PITCH TOKEN
1 a 10 Measurement 1
2 a 11 Measurement 2
3 a 12 Measurement 3
4 i 15 Measurement 1
5 i 16 Measurement 2
6 i 17 Measurement 3
7 a 11 Average
8 i 16 Average