Question

我有以下数据，并希望计算每个cid按照iid分组的t1-t5的平均值。

我只想计算值的平均值＆gt; 0
理想情况下，我不想在所有字段中命名，例如mean（t1），mean（t2）。这是因为在我的实际案例中，我有200多个字段。

示例数据：

library(dplyr)
test <- read.csv("~/Documents/R-SCRIPTS/DATA/test.csv", sep=";")

t <- test %>% 
  group_by(cid, iid) %>%
  select(t1:t5) %>%
  summarise(t1 = mean(t1, na.rm = TRUE), 
            t2 = mean(t2,na.rm = TRUE), 
            t3 = mean(t3,na.rm = TRUE), 
            t4 = mean(t4,na.rm = TRUE), 
            t5 = mean(t5,na.rm = TRUE) 
            )

到目前为止，这是我的代码。有人可以帮我完成它。提前谢谢。

var items = [{
    id: 1,
    label: "David"
}, {
    id: 2,
    label: "Jhon"
}, {
    id: 3,
    label: "Lisa"
}, {
    id: 4,
    label: "Nicole"
}, {
    id: 5,
    label: "Danny"
}];
var backendSelection = "David,Lisa";
var selectedLabels = backendSelection.split(",");

$scope.example13model = items.
filter(function(item) {
    // if the the label property of the current item
    // is found in selectedLabels, return true (i.e. allow the current item
    // to pass through the filter) otherwise false.
    return selectedLabels.some(function(label) {
        // whenever the following expression evaluates to true,
        // the current item will be selected.
        return label === item.label;
    });
});

Answer 1

如果我理解正确，你可以简单地使用：

test %>% 
  group_by(cid, iid) %>% 
  summarise_each(funs(mean(.[.>0], na.rm = TRUE)), t1:t5)
#Source: local data frame [3 x 7]
#Groups: cid [?]
#
#    cid   iid    t1    t2    t3    t4    t5
#  (int) (int) (dbl) (dbl) (dbl) (dbl) (dbl)
#1   841     2   9.0     2     1     5   7.0
#2  2134     1   6.0     9     8     2   1.0
#3  4503     2   5.5     5     4     4   7.5

Answer 2

这是你想要的吗？我不使用其他包，而是使用colMeans()。这是一个例子：

数据看起来像（您的示例的简短副本）

 > mydata
      iid t1 t2 t3
    1   2  4  5  5
    2   2  7  5  3
    3   2  9  2  1
    4   1  6  9  8

代码：

id_list <- unique(mydata$iid) # get the id
result <- matrix(nrow=0, ncol=4) # create a matrix to store result
colnames(result) <- colnames(mydata) # name the columns of the matrix
for (i in 1:length(id_list)){
   uid <- id_list[i]
   # for each id, calculate the column averages
   average <- unname(colMeans(mydata[mydata$iid==uid,2:4])) 
   # write to the result
   result <- rbind(result, c(uid, average))
}
result

结果如下：

    > result
     iid       t1 t2 t3
[1,]   2 6.666667  4  3
[2,]   1 6.000000  9  8

对于您的问题，您需要将colMeans(mydata[mydata$iid==uid,2:4])更改为colMeans(mydata[mydata$iid==uid,2:201])，这是您希望平均值的列索引。并更改ncol中与您想要的结果数据相对应的matrix(nrow=0, ncol=4)。

对于值＆lt; 0，您可以先将负值转换为NA mydata[,2:4][mydata[,2:4]<0]<-NA。然后在na.rm=TRUE中添加colMeans()。

更新相同的示例：

> mydata
  iid t1 t2 t3
1   2  4  5  5
2   2 -2  5  3
3   2  9  2  1
4   1  6  9 -1

代码：

mydata[,2:4][mydata[,2:4]<0]<-NA
id_list <- unique(mydata$iid)
result <- matrix(nrow=0, ncol=4)
colnames(result) <- colnames(mydata)
for (i in 1:length(id_list)){
   uid <- id_list[i]
   average <- unname(colMeans(mydata[mydata$iid==uid,2:4], na.rm=TRUE))
   result <- rbind(result, c(uid, average))
}
result

结果：

> result
     iid  t1 t2  t3
[1,]   2 6.5  4   3
[2,]   1 6.0  9 NaN

对于大于X的值，计算多列上的R的平均值

2 个答案: