R中的智能合并

时间:2017-11-29 13:03:38

标签: r merge

我有两个数据集:

df1=read.csv("C:/Users/synthex/Desktop/111.csv", sep=";",dec=",")
    structure(list(id = 1:10, mark = structure(c(3L, 4L, 4L, 6L, 
    2L, 5L, 7L, 9L, 8L, 1L), .Label = c("6,50-16 Я-387-1", "cvb", 
    "ert", "fgdhj", "fgj", "ghm", "jgfh", "ng", "vbn,"), class = "factor"), 
        gost = structure(c(1L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 2L
        ), .Label = c("gost1", "gost10", "gost2", "gost3", "gost4", 
        "gost5", "gost6", "gost7", "gost8", "gost9"), class = "factor"), 
        number = c(4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L), man = structure(c(1L, 
        1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "voltar", class = "factor"), 
        price = 67:76), .Names = c("id", "mark", "gost", "number", 
    "man", "price"), class = "data.frame", row.names = c(NA, -10L))

第二个数据集

     df2=read.csv("C:/Users/synthex/Desktop/112.csv", sep=";",dec=",")
   structure(list(id = c(10L, 10L, NA, 18L, 18L, NA, 7L, 7L, NA, 
10L, 4L), id.1 = structure(c(6L, 2L, 1L, 2L, 3L, 1L, 7L, 4L, 
1L, 6L, 5L), .Label = c("", "et", "rey", "rty", "ryy1", "The Tire 6,50-16 I-387-1", 
"utreu"), class = "factor"), Weight = structure(c(1L, 5L, 1L, 
1L, 4L, 1L, 1L, 3L, 1L, 1L, 2L), .Label = c("", "0.5339173", 
"0.5349673", "0.5361807", "0.5372405"), class = "factor")), .Names = c("id", 
"id.1", "Weight"), class = "data.frame", row.names = c(NA, -11L
))

我必须通过id

加入这个数据集
a1=merge(df1, df2, by = "id")

在输出中,我得到了错误的表格格式,比如

id                        ido   Weight            mark      gost    number  man  price
10  The Tire 6,50-16 I-387-1                6,50-16 Я-387-1 gost    4   voltar
10  The Tire 6,50-16 I-387-1    0.3926514   6,50-16 Я-387-1 gost    4   voltar
10  The Tire 6,50-16 I-387-1    0.3803419   6,50-16 Я-387-1 gost    4   voltar
10  The Tire 6,50-16 I-387-1    0.3841079   6,50-16 Я-387-1 gost    4   voltar
10  The Tire 6,50-16 I-387-1    0.4272772   6,50-16 Я-387-1 gost    4   voltar
10  The Tire 6,50-16 I-387-1    0.4442845   6,50-16 Я-387-1 gost    4   voltar
10  The Tire 6,50-16 I-387-1                6,50-16 Я-387-1 gost    4   voltar
10  The Tire 6,50-16 I-387-1                6,50-16 Я-387-1 gost    4   voltar
10  The Tire 6,50-16 I-387-1                6,50-16 Я-387-1 gost    4   voltar
10  The Tire 6,50-16 I-387-1                6,50-16 Я-387-1 gost    4   voltar

但我需要这种格式。看屏幕。 I.E.相应的id分配相反,在这种情况下,这里三次id№10重复

needed format

如何合并表格以获得所需的格式? 订购必须

10
1
10
2
10
3
11
1
11
2
...

1 个答案:

答案 0 :(得分:1)

首先,让id作为一个因素,因为它有意义:

df1$id <- as.factor(df1$id)
df2$id <- as.factor(df2$id)

然后我们可以合并数据集并指定我们是否要保留一个数据集中的所有行,即使它们与另一个数据集没有任何匹配all.x(保持df1的行)和all.y(保留df2行)。我还用NA id

清理了一点一滴的行
library(tidyr) # For the drop_na()

(df <- merge(df1, df2, by = "id", all.y = T) %>% drop_na(id))
  id            mark   gost number    man price                     id.1    Weight
1  4             ghm  gost4      4 voltar    70                     ryy1 0.5339173
2  7            jgfh  gost7      4 voltar    73                      rty 0.5349673
3  7            jgfh  gost7      4 voltar    73                    utreu          
4 10 6,50-16 Я-387-1 gost10      4 voltar    76                       et 0.5372405
5 10 6,50-16 Я-387-1 gost10      4 voltar    76 The Tire 6,50-16 I-387-1          
6 10 6,50-16 Я-387-1 gost10      4 voltar    76 The Tire 6,50-16 I-387-1          
7 18            <NA>   <NA>     NA   <NA>    NA                      rey 0.5361807
8 18            <NA>   <NA>     NA   <NA>    NA                       et