在R中对分组的行执行计算,并将结果添加到现有列

时间:2019-11-13 14:21:04

标签: r dataframe grouping

我想通过对R中的数据帧中的行进行分组来执行计算。为此,我要做的是扩展列并在列上进行计算,但是我也希望能够在不重塑数据的情况下做到这一点帧。例如,我想对每个主题在varA和varB上执行一次foldchange计算,将“ post”时间点除以“ pre”时间点,以使下面的数据帧df看起来像df_foldchange。我希望计算成为现有“时间点”列中的新元素。

df <- data.frame(subject = c('subject1', 'subject1', 'subject2', 'subject2'),
                 varA = c(1, 2, 1, 3),
                 varB = c(2, 3, 2, 4),
                 timepoint = c('pre', 'post', 'pre', 'post'))

df_foldchange <- data.frame(subject = c('subject1', 'subject1', 'subject1',
                             'subject2', 'subject2', 'subject2'),
                 varA = c(1, 2, 2, 1, 3, 3),
                 varB = c(2, 3, 1.5, 2, 4, 2),
                 timepoint = c('pre', 'post', 'foldchange', 
                               'pre', 'post', 'foldchange'))

2 个答案:

答案 0 :(得分:0)

我怀疑您在df的构造中混淆了“前置” /“前置”顺序?有了它的方式,您没有“ subject1”的“ post”或“ subject2”的“ pre”。

您可以这样做:

df <- data.frame(subject = c('subject1', 'subject1', 'subject2', 'subject2'),
                 varA = c(1, 2, 1, 3),
                 varB = c(2, 3, 2, 4),
                 timepoint = c('pre', 'post', 'pre', 'post'),
                 stringsAsFactors = FALSE)

df1 <- df %>% 
       group_by(subject) %>% 
       summarise(varA = varA[timepoint=='post'] / varA[timepoint=='pre'],
                 varB = varB[timepoint=='post'] / varB[timepoint=='pre'], 
                 timepoint = 'foldchange') 
df_foldchange <- df %>%
                 bind_rows(df1) %>%
                 arrange(subject)

# output
   subject varA varB  timepoint
1 subject1    1  2.0        pre
2 subject1    2  3.0       post
3 subject1    2  1.5 foldchange
4 subject2    1  2.0        pre
5 subject2    3  4.0       post
6 subject2    3  2.0 foldchange

如果顺序很重要,则可以对上面的内容进行排序以获得所需的输出。

答案 1 :(得分:0)

使用data.table,您可以执行以下操作:

df <- data.frame(subject = c('subject1', 'subject1', 'subject2', 'subject2'),
                 varA = c(1, 2, 1, 3),
                 varB = c(2, 3, 2, 4),
                 timepoint = c('pre', 'post', 'pre', 'post'))

library(data.table)
setDT(df)#converting data frame into data.table
df2<- df[,lapply(.SD, function(x) x[timepoint=="post"]/x[timepoint=="pre"]),subject, .SDcols=varA:varB] #performing computation per columns requiered
df2[,timepoint:="foldchange"] #adding variable "foldchange"
df_foldchange <- rbind(df,df2) #binding per row
df_foldchange[order(subject)]

#output
    subject varA varB  timepoint
1: subject1    1  2.0        pre
2: subject1    2  3.0       post
3: subject1    2  1.5 foldchange
4: subject2    1  2.0        pre
5: subject2    3  4.0       post
6: subject2    3  2.0 foldchange