我有一个数据集,其中包含我的学科的一些信息。他们要么是患者,要么是对照组,并且按年龄和性别匹配(不适用于年龄)。 数据的组织方式使得每一行代表一个不同的主题:
data_ex <- data.frame( pnum = c(1,2,3,4,5,6,7,8,9,10),
matched_pnum = c(10,6,7,9,8,2,3,5,4,1),
group = c("patient", "patient","patient","patient","patient","control","control","control", "control", "control"),
age = c(24,35,43,34,55,24,36,43,34,54),
gender = c("f","m","f","f","m","f","m","f","f","m"))
看起来像:
pnum matched_pnum group age gender
1 10 patient 24 f
2 6 patient 35 m
3 7 patient 43 f
4 9 patient 34 f
5 8 patient 55 m
6 2 control 24 f
7 3 control 36 m
8 5 control 43 f
9 4 control 34 f
10 1 control 54 m
我想查看特定结果变量(例如mean_power)上成对组之间的差异。然而,这些也细分为不同的频率。例如:
power_data_ex <- data.frame (pnum = c(1,1,1,2,2,2,3,3,3,4,4,4,5,5,5,6,6,6,7,7,7,8,8,8,9,9,9,10,10,10),
freq =c(0.6,0.8,1,0.6,0.8,1,0.6,0.8,1,0.6,0.8,1,0.6,0.8,1,0.6,0.8,1,0.6,0.8,1,0.6,0.8,1,0.6,0.8,1,0.6,0.8,1),
mean_power = c(200,145,357,200,345,173,236,276,233,166,
188,321,423,257,126,236,125,132,164,267,
311,264,401,287,246,211,189,256,122,351))
我合并了两个数据框:
merged_ex <- merge(data_ex, power_data_ex, by="pnum")
但是现在我陷入了如何最好地组织我的数据的困境,因此我可以基于它们的pnum和matched_pnum值获得每个患者对照对的每个mean_power频率的差异得分!
编辑:预期结果如下:
outcome_ex <- data.frame( pnum_diff = c("1-10","2-6","3-7","4-9","5-8","1-10","2-6","3-7","4-9","5-8","1-10","2-6","3-7","4-9","5-8"), freq =c(0.6,0.8,1,0.6,0.8,1,0.6,0.8,1,0.6,0.8,1,0.6,0.8,1), mean_power_diff = c(-56,23,6, -36, 220, 41, 72,9, -78,-80, -23,132, 159, -144, -161))