这个问题类似于我在这里找到的问题:Multiply rows (with row names) in one data frame with matching column names in another
但是,我希望将df1中的列值与df2中的列名匹配,而不是匹配行和乘法,并在新的df3中返回df2的相应行值。
df1 <- data.frame(V1=c(1:6),V2=c("X3", "X3_8", "NA", "X5", "X4_5", "X3_8"))
df1
V1 V2
1 1 X3
2 2 X3_8
3 3 NA
4 4 X5
5 5 X4_5
6 6 X3_8
df2 <- data.frame(name=c("John", "Mary", "Joe", "Tim", "Bob", "Pat"),
X3=c(0.5, 1.2, 0.75, 3.1, 2.0, 1.1),
X5=c(1.0, 2.3, 4.2, 5, 1.1, 3.0),
X3_8=c(0.6, 1.0, 2.0, 1.0, 0.7, 1.4),
X4_5=c(0.4, 0.3, 3.0, 1.0, 2.0, 0.9))
df2
name X3 X5 X3_8 X4_5
1 John 0.5 1.0 0.6 0.4
2 Mary 1.2 2.3 1.0 0.3
3 Joe 0.75 4.2 2.0 3.0
4 Tim 3.1 5.0 1.0 1.0
5 Bob 2.0 1.1 0.7 2.0
6 Pat 1.1 3.0 1.4 0.9
这就是我想要的:
df3 <- data.frame(name=c("John", "Mary", "Joe", "Tim", "Bob", "Pat"),
values=c(0.5, 1.0, NA, 5.0, 1.0, 1.4))
name values
1 John 0.5
2 Mary 1.0
3 Joe NA
4 Tim 5.0
5 Bob 1.0
6 Pat 1.4
在我的真实df1和df2中有64行,其中&#34; V1&#34;在df1中对应于&#34; name&#34;的数字索引。 df2中的列。在我的df2中,有22列,即一个用&#34; name&#34;和另外21个&#34; X *&#34;匹配&#34; V2&#34;在df1。我试过转换&#34; V2&#34;行名称,但这不起作用,因为有NA和重复值。
奖金但不是必需的:我有10个df1s和10个df2s,需要为每对df1s和df2s执行此操作,其中df1s和df2s的名称包含常用年份。例如,我需要将df1_2004与df2_2004匹配,创建df3_2004,然后继续执行df1_2005和df2_2005,依此类推。我确信没有for循环和if语句,这是一种优雅的方法。
感谢您的帮助。我确信这是一个简单的基础R或tidyrverse解决方案,但我努力将各个部分放在一起。原谅我新手对R中索引的理解。
答案 0 :(得分:0)
将df2
重塑为长格式并将左连接与df1
相结合,您可以获得所需的结果。
使用:
library(dplyr)
library(tidyr)
df3 <- df1 %>%
mutate(name = df2$name[V1]) %>% # or just mutate(name = df2$name) when the index is equal to the rownumbers
left_join(., df2 %>%
gather(V2, values, -1) %>%
group_by(V2) %>%
mutate(V1 = row_number()),
by = c('V2','V1')) %>%
select(name = name.x, values)
给出:
> df3 name values 1 John 0.5 2 Mary 1.0 3 Joe NA 4 Tim 5.0 5 Bob 2.0 6 Pat 1.4
答案 1 :(得分:0)
世界上功能较少的程序:
n_row <- nrow(df1)
# corce the variable V1 in a factor with the name variables of the
# df2
df1$V1 <- factor(df1$V1, labels = df2$name)
# coerce the variable V2 into a character vector or use 'stringsAsFactors = FALSE'
# when you read the data frame
df1$V2 <- as.character(df1$V2)
# create a copy of df1 to impute values of the V2 col
df3 <- df1
for (i in 1:n_row) {
col_index <- which(df1[i, "V2"] == names(df2), arr.ind = TRUE)
row_index <- which(df1[i, "V1"] == df2$name, arr.ind = TRUE)
if (length(col_index) == 0) {
df3[i, "V2"] <- NA
} else {
df3[i, "V2"] <- df2[row_index, col_index]
}
}
names(df3) <- c("name", "values")
给出:
#>df3
name values
1 John 0.5
2 Mary 1
3 Joe <NA>
4 Tim 5
5 Bob 2
6 Pat 1.4