合并两个数据集

时间:2018-03-05 22:04:51

标签: r dplyr

我按如下方式创建节点列表:

name <- c("Joe","Frank","Peter")
city <- c("New York","Detroit","Maimi")
age <- c(24,55,65)
node_list <- data.frame(name,age,city)

node_list    
   name age     city
1   Joe  24 New York
2 Frank  55  Detroit
3 Peter  65    Maimi

然后我按如下方式创建边缘列表:

from <- c("Joe","Frank","Peter","Albert")
to <- c("Frank","Albert","James","Tony")
to_city <- c("Detroit","St. Louis","New York","Carson City")
edge_list <- data.frame(from,to,to_city)

edge_list
    from     to     to_city
1    Joe  Frank     Detroit
2  Frank Albert   St. Louis
3  Peter  James    New York
4 Albert   Tony Carson City

请注意,节点列表和边列表中的名称不会重叠100%。我想创建一个包含所有名称的主节点列表,同时捕获城市信息。这是我的dplyr尝试这样做:

new_node <- edge_list %>%
  gather("from_to", "name", from, to) %>%
  distinct(name) %>%
  full_join(node_list)

new_node
  name age     city
1    Joe  24 New York
2  Frank  55  Detroit
3  Peter  65    Maimi
4 Albert  NA     <NA>
5  James  NA     <NA>
6   Tony  NA     <NA>

我需要弄清楚如何添加to_city信息。为了实现这一点,我需要添加到我的dplyr代码中?感谢。

1 个答案:

答案 0 :(得分:5)

加入两次,一次登录to,一次登录from,其中不相关的列已进行子集化:

library(dplyr)

node_list <- data_frame(name = c("Joe", "Frank", "Peter"),
                        city = c("New York", "Detroit", "Maimi"),
                        age = c(24, 55, 65))

edge_list <- data_frame(from = c("Joe", "Frank", "Peter", "Albert"),
                        to = c("Frank", "Albert", "James", "Tony"),
                        to_city = c("Detroit", "St. Louis", "New York", "Carson City"))

node_list %>% 
    full_join(select(edge_list, name = to, city = to_city)) %>% 
    full_join(select(edge_list, name = from))
#> Joining, by = c("name", "city")
#> Joining, by = "name"
#> # A tibble: 6 x 3
#>   name   city          age
#>   <chr>  <chr>       <dbl>
#> 1 Joe    New York      24.
#> 2 Frank  Detroit       55.
#> 3 Peter  Maimi         65.
#> 4 Albert St. Louis     NA 
#> 5 James  New York      NA 
#> 6 Tony   Carson City   NA

在这种情况下,第二个联接不会执行任何操作,因为已经包含了所有人,但它会插入仅存在于from列中的任何人。