我必须将数据框与名为col_id的公共列分开。
我的问题是简单的合并不适合我的情况。
以下是df1 col_id
的示例结构 col_id stock ch2
1 id_100 stock 2 yes
2 id_100002 stock 2 no
3 id_100003 stock 2 no
第二个df
col_id num cat1
1 id_100 num 2 0
2 id_100 num 2 1
3 id_100 num 2 0
4 id_100002 num 2 1
5 id_100002 num 2 1
6 id_100002 num 2 1
7 id_100003 num 2 1
8 id_100003 num 2 1
我想要的输出是用相同的df值填充第二个df的所有单元格。输出示例
col_id num cat1 stock ch2
1 id_100 num 2 0 stock 2 yes
2 id_100 num 2 1 stock 2 yes
3 id_100 num 2 0 stock 2 yes
4 id_100002 num 2 1 stock 2 no
5 id_100002 num 2 1 stock 2 no
6 id_100002 num 2 1 stock 2 no
7 id_100003 num 2 1 stock 2 no
8 id_100003 num 2 1 stock 2 no
答案 0 :(得分:1)
您似乎想要使用all.x
函数的all.y
/ merge
参数。如,
df1 <- data.frame(
col_id = c("id_100", "id_10002", "id_10003"),
stock = c("stock 2"),
ch2 = c("yes", "no", "no")
)
df2 <- data.frame(
col_id = c(rep("id_100", 3),
rep("id_10002", 3),
rep("id_10003", 2)),
num = c("num 2"),
cat1 = c(0, 1, 0, 1, 1, 1, 1, 1)
)
mergedData <- merge(df1, df2, all.y = TRUE)
根据您粘贴的代码段生成所需的输出。您可以使用all.(x|y) = (TRUE|FALSE)
的任意组合来实现适当的连接(内部,外部,左侧,右侧,等等)。 W3 Schools对不同类型的连接有很好的描述(它们在SQL的上下文中讨论,但R的merge
函数是类似的。)
答案 1 :(得分:1)
尝试:
install.packages('dplyr')
library(dplyr)
mytext1 = "col_id,stock, ch2
id_100,stock 2, yes
id_100002,stock 2, no
id_100003,stock 2, no"
mydf1 <- read.table(text=mytext1, header=T, sep=",")
mytext2 = "col_id,num, cat1
id_100,num 2, 0
id_100,num 2, 1
id_100,num 2, 0
id_100002,num 2, 1
id_100002,num 2, 1
id_100002,num 2, 1
id_100003,num 2, 1
id_100003,num 2, 1"
mydf2 <- read.table(text=mytext2, header=T, sep=",")
output_df <- left_join(mydf2,mydf1, by="col_id")
col_id num cat1 stock ch2
id_100 num 2 0 stock 2 yes
id_100 num 2 1 stock 2 yes
id_100 num 2 0 stock 2 yes
id_100002 num 2 1 stock 2 no
id_100002 num 2 1 stock 2 no
id_100002 num 2 1 stock 2 no
id_100003 num 2 1 stock 2 no
id_100003 num 2 1 stock 2 no
答案 2 :(得分:1)
您只需添加两行代码,如下所示
df$stock=rep('stock2',8)
df$ch2[df$col_id %in% c('id_100,num','id_100002','id_100003']=c('yes','no','no')
这可以解决您的问题。