我测量了R数据帧中具有不同功能的变量 这是一个示例数据集:
基本上,每个单词都有自己的标准,可以有任何pos,session和author。我想有一种方法来创建一个新的数据集,其中包含每组变量/特征的平均度量。因此每次猫的平均速度都具有以下特征:作者1会话2和pos名词。然后是相同组合的平均值,但会话3等等......
如何做到这一点?
答案 0 :(得分:0)
我更喜欢tidyverse
方法。
require(tidyverse)
#Creating sample data.
set.seed(1234)
df <- data.frame(measure = round(rnorm(10, mean = 200, sd = 20)),
#Creating random "words" (it can be done nicer than this)
word = rep(c("Cat", "began", "Aggressive")),
pos = rep(c("noun", "verb", "Adjective")),
session = rep(sample(c(1,2,3)),10),
author = rep(sample(c(1,2,3)),10))
#Now we need to group_by() and calculate the mean for each measure by word, pos, session, author
#If I understood correctly the order you want
MyMean <- df %>%
group_by(word, pos, session, author) %>%
#I use na.rm = TRUE in case you have missing values.
summarise(M = mean(measure, na.rm = TRUE))
MyMean
答案 1 :(得分:0)
在基地R中,这可以通过多种方式完成。 `tapply方法将返回一个可以使用其边距元素访问的数组:
meas_tbl <- with(dfrm,
tapply( measure,
INDEX = list(word, pos, session, author),
FUN= mean, na.rm=TRUE ) )
meas_tbl[ "cat", "noun", "2", "1" ]
如果省略数组索引位置的值,则会得到所有可能的子数组(切片)