我有两个数据集:
df1=read.csv("C:/Users/synthex/Desktop/111.csv", sep=";",dec=",")
structure(list(id = 1:10, mark = structure(c(3L, 4L, 4L, 6L,
2L, 5L, 7L, 9L, 8L, 1L), .Label = c("6,50-16 Я-387-1", "cvb",
"ert", "fgdhj", "fgj", "ghm", "jgfh", "ng", "vbn,"), class = "factor"),
gost = structure(c(1L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 2L
), .Label = c("gost1", "gost10", "gost2", "gost3", "gost4",
"gost5", "gost6", "gost7", "gost8", "gost9"), class = "factor"),
number = c(4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L), man = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "voltar", class = "factor"),
price = 67:76), .Names = c("id", "mark", "gost", "number",
"man", "price"), class = "data.frame", row.names = c(NA, -10L))
第二个数据集
df2=read.csv("C:/Users/synthex/Desktop/112.csv", sep=";",dec=",")
structure(list(id = c(10L, 10L, NA, 18L, 18L, NA, 7L, 7L, NA,
10L, 4L), id.1 = structure(c(6L, 2L, 1L, 2L, 3L, 1L, 7L, 4L,
1L, 6L, 5L), .Label = c("", "et", "rey", "rty", "ryy1", "The Tire 6,50-16 I-387-1",
"utreu"), class = "factor"), Weight = structure(c(1L, 5L, 1L,
1L, 4L, 1L, 1L, 3L, 1L, 1L, 2L), .Label = c("", "0.5339173",
"0.5349673", "0.5361807", "0.5372405"), class = "factor")), .Names = c("id",
"id.1", "Weight"), class = "data.frame", row.names = c(NA, -11L
))
我必须通过id
加入这个数据集a1=merge(df1, df2, by = "id")
在输出中,我得到了错误的表格格式,比如
id ido Weight mark gost number man price
10 The Tire 6,50-16 I-387-1 6,50-16 Я-387-1 gost 4 voltar
10 The Tire 6,50-16 I-387-1 0.3926514 6,50-16 Я-387-1 gost 4 voltar
10 The Tire 6,50-16 I-387-1 0.3803419 6,50-16 Я-387-1 gost 4 voltar
10 The Tire 6,50-16 I-387-1 0.3841079 6,50-16 Я-387-1 gost 4 voltar
10 The Tire 6,50-16 I-387-1 0.4272772 6,50-16 Я-387-1 gost 4 voltar
10 The Tire 6,50-16 I-387-1 0.4442845 6,50-16 Я-387-1 gost 4 voltar
10 The Tire 6,50-16 I-387-1 6,50-16 Я-387-1 gost 4 voltar
10 The Tire 6,50-16 I-387-1 6,50-16 Я-387-1 gost 4 voltar
10 The Tire 6,50-16 I-387-1 6,50-16 Я-387-1 gost 4 voltar
10 The Tire 6,50-16 I-387-1 6,50-16 Я-387-1 gost 4 voltar
但我需要这种格式。看屏幕。 I.E.相应的id分配相反,在这种情况下,这里三次id№10重复
如何合并表格以获得所需的格式? 订购必须
10
1
10
2
10
3
11
1
11
2
...
答案 0 :(得分:1)
首先,让id
作为一个因素,因为它有意义:
df1$id <- as.factor(df1$id)
df2$id <- as.factor(df2$id)
然后我们可以合并数据集并指定我们是否要保留一个数据集中的所有行,即使它们与另一个数据集没有任何匹配all.x
(保持df1
的行)和all.y
(保留df2
行)。我还用NA
id
:
library(tidyr) # For the drop_na()
(df <- merge(df1, df2, by = "id", all.y = T) %>% drop_na(id))
id mark gost number man price id.1 Weight
1 4 ghm gost4 4 voltar 70 ryy1 0.5339173
2 7 jgfh gost7 4 voltar 73 rty 0.5349673
3 7 jgfh gost7 4 voltar 73 utreu
4 10 6,50-16 Я-387-1 gost10 4 voltar 76 et 0.5372405
5 10 6,50-16 Я-387-1 gost10 4 voltar 76 The Tire 6,50-16 I-387-1
6 10 6,50-16 Я-387-1 gost10 4 voltar 76 The Tire 6,50-16 I-387-1
7 18 <NA> <NA> NA <NA> NA rey 0.5361807
8 18 <NA> <NA> NA <NA> NA et