Question

说我有多个时间点的变量，我想对所有时间点做一些操作。与在每个时间点上相比，我如何以更有效的方式做到这一点？在下面的示例中，我想要1）获取每个时间点选定列的总和，以及2）每个变量，查看从基线到所有时间点的变化量

#fake data to show what the dataset I receive looks like:
library(reshape2)
id=rep(c(1,1,1,2,2,2,3,3,3), 3)            
time=c(rep("Time1",9), rep("Time2",9), rep("Time3",9))
test=rep(c("calcium","magnesium","zinc"), 9) 
score=rnorm(n = 27, mean = 10, sd = 3)
fake <- data.frame(id, time, test, score)
fake <- dcast(fake, id ~ time + test)

#Task 1- Get total of selected columns at each time point
#Non-efficient method:
fake$totalmgcad1 <- rowSums(fake[,c("Time1_calcium", "Time1_magnesium")])
fake$totaldmgca2 <- rowSums(fake[,c("Time2_calcium", "Time2_magnesium")])
fake$totaldmgca3 <- rowSums(fake[,c("Time3_calcium", "Time3_magnesium")])


#Task 2 - Get change in calcium levels from baseline to each day
#Non-efficient method:
fake$calciumt1t2 <- fake$Time2_calcium - fake$Time1_calcium
fake$calciumt1t3 <- fake$Time3_calcium - fake$Time1_calcium

关于如何在更少的行中完成上述操作的任何提示？有没有办法使用group_by（）呢，还是我需要列出列表并使用lapply（）？

Answer 1

对我来说，一个好的开始是将原始数据保持为长整齐的格式，例如：

library(tidyverse)

id <- c(rep(1,3), rep(2,3), rep(3,3))
set.seed(1) # for reproducible sample values
value <- rnorm(9)
param <- c(rep("calcium", 3), rep("magnesium", 3), rep("zinc", 3))
time  <- rep(c(1,2,3), 3)
df <- data.frame(id, value, param, time)
as_tibble(df) #convenient way to see the data
# A tibble: 9 x 4
#     id  value   param      time
#     <dbl> <dbl> <fct>      <dbl>
#1     1  -0.626 calcium       1
#2     1   0.184 calcium       2
#3     1  -0.836 calcium       3
#4     2   1.60  magnesium     1
#5     2   0.330 magnesium     2
#6     2  -0.820 magnesium     3
#7     3   0.487 zinc          1
#8     3   0.738 zinc          2
#9     3   0.576 zinc          3

，然后如果您要查找的行数更少，则可以在另一个文件中定义一个函数（例如在function_defs.r中定义，例如difference_from_baseline()，因此在原始文件中您可以执行类似的操作找到适用于数学的正确函数后，请在主工作文件的一行中operated_on_desired_data <- difference_from_baseline(df)行。

Answer 2

您可能首先考虑将数据保留为长格式；也就是说，停在：

fake <- data.frame(id, time, test, score)

也不要dcast。

现在您可以使用dplyr函数。

library(dplyr)

例如，为所有测试的基线水平更改添加一列：

fake %>% 
  arrange(time) %>% 
  group_by(id, test) %>% 
  mutate(test_diff = score - lag(score))

要在每次添加一列钙和镁总和：

fake %>% 
  group_by(id, time) %>% 
  filter(test != "zinc") %>% 
  summarise(total_mgca = sum(score)) %>% 
  right_join(fake)

一起：

fake %>% 
  group_by(id, time) %>% 
  filter(test != "zinc") %>% 
  summarise(total_mgca = sum(score)) %>% 
  ungroup() %>% 
  right_join(fake) %>% 
  arrange(time) %>% 
  group_by(id, test) %>% 
  mutate(test_diff = score - lag(score)) %>%
  ungroup()

结果：

   id  time total_mgca      test     score   test_diff
1   1 Time1   21.64788   calcium 12.296461          NA
2   1 Time1   21.64788 magnesium  9.351419          NA
3   1 Time1   21.64788      zinc  6.897300          NA
4   2 Time1   25.16516   calcium 11.026712          NA
5   2 Time1   25.16516 magnesium 14.138449          NA
6   2 Time1   25.16516      zinc  4.462579          NA
7   3 Time1   15.39817   calcium  5.778935          NA
8   3 Time1   15.39817 magnesium  9.619240          NA
9   3 Time1   15.39817      zinc  4.976049          NA
10  1 Time2   29.97949   calcium 11.152820  -1.1436409
11  1 Time2   29.97949 magnesium 18.826667   9.4752480
12  1 Time2   29.97949      zinc  8.280754   1.3834534
13  2 Time2   32.65905   calcium 16.469051   5.4423387
14  2 Time2   32.65905 magnesium 16.190000   2.0515508
15  2 Time2   32.65905      zinc 10.781192   6.3186129
16  3 Time2   14.24311   calcium  3.843355  -1.9355800
17  3 Time2   14.24311 magnesium 10.399755   0.7805155
18  3 Time2   14.24311      zinc  7.868311   2.8922628
19  1 Time3   23.26662   calcium  9.325816  -1.8270041
20  1 Time3   23.26662 magnesium 13.940803  -4.8858643
21  1 Time3   23.26662      zinc 13.984667   5.7039133
22  2 Time3   16.67828   calcium  5.142377 -11.3266742
23  2 Time3   16.67828 magnesium 11.535903  -4.6540968
24  2 Time3   16.67828      zinc 13.057014   2.2758226
25  3 Time3   25.09958   calcium 14.158592  10.3152371
26  3 Time3   25.09958 magnesium 10.940988   0.5412329
27  3 Time3   25.09958      zinc 11.229914   3.3616030

如何将函数应用于所有在时间点上彼此对应的列？

2 个答案: