对于大于X的值,计算多列上的R的平均值

时间:2016-01-15 00:27:13

标签: r dplyr

我有以下数据,并希望计算每个cid按照iid分组的t1-t5的平均值。

  1. 我只想计算值的平均值> 0
  2. 理想情况下,我不想在所有字段中命名,例如mean(t1),mean(t2)。这是因为在我的实际案例中,我有200多个字段。
  3. 示例数据:

    library(dplyr)
    test <- read.csv("~/Documents/R-SCRIPTS/DATA/test.csv", sep=";")
    
    t <- test %>% 
      group_by(cid, iid) %>%
      select(t1:t5) %>%
      summarise(t1 = mean(t1, na.rm = TRUE), 
                t2 = mean(t2,na.rm = TRUE), 
                t3 = mean(t3,na.rm = TRUE), 
                t4 = mean(t4,na.rm = TRUE), 
                t5 = mean(t5,na.rm = TRUE) 
                ) 
    

    到目前为止,这是我的代码。有人可以帮我完成它。提前谢谢。

    var items = [{
        id: 1,
        label: "David"
    }, {
        id: 2,
        label: "Jhon"
    }, {
        id: 3,
        label: "Lisa"
    }, {
        id: 4,
        label: "Nicole"
    }, {
        id: 5,
        label: "Danny"
    }];
    var backendSelection = "David,Lisa";
    var selectedLabels = backendSelection.split(",");
    
    $scope.example13model = items.
    filter(function(item) {
        // if the the label property of the current item
        // is found in selectedLabels, return true (i.e. allow the current item
        // to pass through the filter) otherwise false.
        return selectedLabels.some(function(label) {
            // whenever the following expression evaluates to true,
            // the current item will be selected.
            return label === item.label;
        });
    });
    

2 个答案:

答案 0 :(得分:1)

如果我理解正确,你可以简单地使用:

test %>% 
  group_by(cid, iid) %>% 
  summarise_each(funs(mean(.[.>0], na.rm = TRUE)), t1:t5)
#Source: local data frame [3 x 7]
#Groups: cid [?]
#
#    cid   iid    t1    t2    t3    t4    t5
#  (int) (int) (dbl) (dbl) (dbl) (dbl) (dbl)
#1   841     2   9.0     2     1     5   7.0
#2  2134     1   6.0     9     8     2   1.0
#3  4503     2   5.5     5     4     4   7.5

答案 1 :(得分:0)

这是你想要的吗?我不使用其他包,而是使用colMeans()。 这是一个例子:

数据看起来像(您的示例的简短副本)

 > mydata
      iid t1 t2 t3
    1   2  4  5  5
    2   2  7  5  3
    3   2  9  2  1
    4   1  6  9  8

代码:

id_list <- unique(mydata$iid) # get the id
result <- matrix(nrow=0, ncol=4) # create a matrix to store result
colnames(result) <- colnames(mydata) # name the columns of the matrix
for (i in 1:length(id_list)){
   uid <- id_list[i]
   # for each id, calculate the column averages
   average <- unname(colMeans(mydata[mydata$iid==uid,2:4])) 
   # write to the result
   result <- rbind(result, c(uid, average))
}
result

结果如下:

    > result
     iid       t1 t2 t3
[1,]   2 6.666667  4  3
[2,]   1 6.000000  9  8

对于您的问题,您需要将colMeans(mydata[mydata$iid==uid,2:4])更改为colMeans(mydata[mydata$iid==uid,2:201]),这是您希望平均值的列索引。并更改ncol中与您想要的结果数据相对应的matrix(nrow=0, ncol=4)

对于值&lt; 0,您可以先将负值转换为NA mydata[,2:4][mydata[,2:4]<0]<-NA。然后在na.rm=TRUE中添加colMeans()

更新相同的示例:

> mydata
  iid t1 t2 t3
1   2  4  5  5
2   2 -2  5  3
3   2  9  2  1
4   1  6  9 -1

代码:

mydata[,2:4][mydata[,2:4]<0]<-NA
id_list <- unique(mydata$iid)
result <- matrix(nrow=0, ncol=4)
colnames(result) <- colnames(mydata)
for (i in 1:length(id_list)){
   uid <- id_list[i]
   average <- unname(colMeans(mydata[mydata$iid==uid,2:4], na.rm=TRUE))
   result <- rbind(result, c(uid, average))
}
result

结果:

> result
     iid  t1 t2  t3
[1,]   2 6.5  4   3
[2,]   1 6.0  9 NaN