按列合并多行

时间:2016-08-03 14:04:40

标签: r

我必须将数据框与名为col_id的公共列分开。

我的问题是简单的合并不适合我的情况。

以下是df1 col_id

的示例结构
     col_id   stock  ch2
1    id_100 stock 2  yes
2 id_100002 stock 2   no
3 id_100003 stock 2   no

第二个df

     col_id   num cat1
1    id_100 num 2    0
2    id_100 num 2    1
3    id_100 num 2    0
4 id_100002 num 2    1
5 id_100002 num 2    1
6 id_100002 num 2    1
7 id_100003 num 2    1
8 id_100003 num 2    1

我想要的输出是用相同的df值填充第二个df的所有单元格。输出示例

     col_id   num cat1   stock  ch2
1    id_100 num 2    0 stock 2  yes
2    id_100 num 2    1 stock 2  yes
3    id_100 num 2    0 stock 2  yes
4 id_100002 num 2    1 stock 2   no
5 id_100002 num 2    1 stock 2   no
6 id_100002 num 2    1 stock 2   no
7 id_100003 num 2    1 stock 2   no
8 id_100003 num 2    1 stock 2   no

3 个答案:

答案 0 :(得分:1)

您似乎想要使用all.x函数的all.y / merge参数。如,

df1 <- data.frame(
  col_id = c("id_100", "id_10002", "id_10003"),
  stock = c("stock 2"),
  ch2 = c("yes", "no", "no")
)

df2 <- data.frame(
  col_id = c(rep("id_100", 3),
             rep("id_10002", 3),
             rep("id_10003", 2)),
  num = c("num 2"),
  cat1 = c(0, 1, 0, 1, 1, 1, 1, 1)
)

mergedData <- merge(df1, df2, all.y = TRUE)

根据您粘贴的代码段生成所需的输出。您可以使用all.(x|y) = (TRUE|FALSE)的任意组合来实现适当的连接(内部,外部,左侧,右侧,等等)。 W3 Schools对不同类型的连接有很好的描述(它们在SQL的上下文中讨论,但R的merge函数是类似的。)

答案 1 :(得分:1)

尝试:

install.packages('dplyr')
library(dplyr)

mytext1 = "col_id,stock, ch2
id_100,stock 2, yes
id_100002,stock 2, no
id_100003,stock 2, no"
mydf1 <- read.table(text=mytext1, header=T, sep=",")

mytext2 = "col_id,num, cat1
id_100,num 2, 0
id_100,num 2, 1
id_100,num 2, 0
id_100002,num 2, 1
id_100002,num 2, 1
id_100002,num 2, 1
id_100003,num 2, 1
id_100003,num 2, 1"

mydf2 <- read.table(text=mytext2, header=T, sep=",")
output_df <- left_join(mydf2,mydf1, by="col_id")

  col_id    num    cat1  stock   ch2
 id_100    num 2    0   stock 2  yes
 id_100    num 2    1   stock 2  yes
 id_100    num 2    0   stock 2  yes
 id_100002 num 2    1   stock 2   no
 id_100002 num 2    1   stock 2   no
 id_100002 num 2    1   stock 2   no
 id_100003 num 2    1   stock 2   no
 id_100003 num 2    1   stock 2   no

答案 2 :(得分:1)

您只需添加两行代码,如下所示

    df$stock=rep('stock2',8)
    df$ch2[df$col_id %in% c('id_100,num','id_100002','id_100003']=c('yes','no','no')

这可以解决您的问题。