使用另一个数据框中的两个变量之间的关系在数据框中填充新变量

时间:2019-02-28 00:08:45

标签: r dataframe dataset

我有两个观测值不同的数据帧(一个长2220欧,另一个宽37欧)。数据帧共享变量“ SID”,尽管在长数据帧中,每个SID值有60行,而在宽行中只有1行。宽数据帧具有一个附加变量“ Experimenter”,每个SID都有一个对应的实验者编号。我想在长数据帧中创建一个“ Experimenter”列,尽管每个SID都有60个实例,并且我希望每次SID值出现时(这样60次)都添加并重复相应的Experimenter值。

针对每个主题的嵌套if-else命令似乎非常繁琐,所以我希望有替代方法

我已经添加了每个数据帧的dput输出,不幸的是,我不确定如何嵌入它们。现在,在长数据帧“ SID”中将其命名为“ Subject”,但它们是相同的变量。

宽幅:

structure(list(SID = 7301:7302, Experimenter = c(2L, 1L)), .Names = c("SID", 
"Experimenter"), class = "data.frame", row.names = c(NA, -2L))

长:

structure(list(Subject = c(7301L, 7301L, 7301L), Session = c(1L, 
1L, 1L), Stimtype = structure(c(1L, 1L, 1L), .Label = "Control", class = 
"factor"), 
Valence = structure(c(1L, 1L, 1L), .Label = "Neutral", class = "factor"), 
Block = c(1L, 1L, 1L), Image = c(12L, 17L, 22L), Group = structure(c(1L, 
3L, 2L), .Label = c("Neutral_1660", "Neutral_5300", "Neutral_7233"
), class = "factor"), Response = c(1L, 1L, 1L), Stimulus = c(1660L, 
7233L, 5300L)), .Names = c("Subject", "Session", "Stimtype", 
"Valence", "Block", "Image", "Group", "Response", "Stimulus"), class = 
"data.frame", row.names = c(NA, 
-3L))

如果我们正在查看这些图像,那么我要做的就是每当“主题”为“ 7301”时在长数据框中插入“ Experimenter”变量,其值为“ 2”。广泛的数据)等等。

先谢谢您。

1 个答案:

答案 0 :(得分:0)

除非我误解了,这似乎是merge / left_join

的简单情况

以R为底

merge(df2, df1, by.x = "Subject", by.y = "SID")
#  Subject Session Stimtype Valence Block Image        Group Response Stimulus
#1    7301       1  Control Neutral     1    12 Neutral_1660        1     1660
#2    7301       1  Control Neutral     1    17 Neutral_7233        1     7233
#3    7301       1  Control Neutral     1    22 Neutral_5300        1     5300
#  Experimenter
#1            2
#2            2
#3            2

或使用dplyr

library(dplyr)
left_join(df2, df1, by = c("Subject" = "SID"))

给出相同的结果


样本数据

df1 <- structure(list(SID = 7301:7302, Experimenter = c(2L, 1L)), .Names = c("SID",
"Experimenter"), class = "data.frame", row.names = c(NA, -2L))

df2 <- structure(list(Subject = c(7301L, 7301L, 7301L), Session = c(1L,
1L, 1L), Stimtype = structure(c(1L, 1L, 1L), .Label = "Control", class =
"factor"),
Valence = structure(c(1L, 1L, 1L), .Label = "Neutral", class = "factor"),
Block = c(1L, 1L, 1L), Image = c(12L, 17L, 22L), Group = structure(c(1L,
3L, 2L), .Label = c("Neutral_1660", "Neutral_5300", "Neutral_7233"
), class = "factor"), Response = c(1L, 1L, 1L), Stimulus = c(1660L,
7233L, 5300L)), .Names = c("Subject", "Session", "Stimtype",
"Valence", "Block", "Image", "Group", "Response", "Stimulus"), class =
"data.frame", row.names = c(NA,
-3L))