将许多数据对合并为单独的数据集

时间:2017-12-26 19:12:06

标签: r merge

我有大约100个数据集对需要合并到单个数据集中,我查看过显示如何一起合并多个数据集的帖子(例如herehere) ,但我的问题很独特。我的真实世界数据存储在我的硬盘上并且名称相似(例如household2010household2011household2012person2010person2011,{{ 1}}。他们不需要加载到全球环境中。例如下面。

工作数据:

person2012

我需要将library(tidyverse) household2010 <- tribble( ~id, ~var2, ~var3, ~var4, ~var5, "1", "1", "1", "a", "d", "2", "2", "2", "b", "e", "3", "3", "3", "c", "f" ) person2010 <- tribble( ~id, ~var6, ~var7, "1", "1", "1", "2", "2", "2", "3", "3", "3", "4", "4", "4" ) household2011 <- tribble( ~id, ~var8, ~var9, ~var10, "1", "1", "1", "1", "2", "2", "2", "2", "3", "3", "3", "3", "4", "4", "4", "4" ) person2011 <- tribble( ~id, ~var11, ~var12, ~var13, "1", "1", "1", "1", "2", "2", "2", "2", "3", "3", "3", "3", "4", "4", "4", "4", "5", "5", "5", "5" ) household2010合并,然后创建一个名为person2010的新数据集。我需要对hhperson2010household2011执行此操作。我可以单独做:

person2011

当我有超过100个数据对时,这变得笨拙。我可以使用hhperson2010 <- left_join(household2010, person2010, by = "id") hhperson2011 <- left_join(household2011, person2011, by = "id") 让它通过数据集列表并合并吗?类似的东西:

lapply

3 个答案:

答案 0 :(得分:1)

也许是这样的:

years <- 2010:2011
result <- lapply(years, 
              function(x) left_join(get(paste0("household", x)), 
                                    get(paste0("person", x)), 
                                    "id"))

names(result) <- paste0("household", years)

答案 1 :(得分:0)

just an alternate solution:
years <- c("2010", "2011", "2014") 

for (x in years){
  result <- merge(get(paste0("household", x)), get(paste0("person", x)), "id")
  names <- paste0("household", x)
  print(names)
  print(result)
}

您可以在循环或lapply之间做出选择,具体取决于您的进一步处理(如果有的话)。如果你没有更多的数据集,我认为lapply只会解决目的。

答案 2 :(得分:0)

以下是我使用tidyverse和list-columns

的方法
library(dplyr)
library(tidyr)
library(purrr)

env2listcol <- function(rdata_file)
{
  e <- new.env()
  load(rdata_file, envir = e)
  # since you know that there's only 1 df in each environment
  as.list.environment(e)[[1]] 
}

# assuming files are stored in `input` folder
dir("input", full.names = T) %>% as_tibble() %>% 
  # split the path
  separate(value, into=c("dir", "file", "ext"), remove=FALSE) %>% 
  # get the category and the key in separate columns
  extract(file, into=c("key", "year"), regex="([a-z]+)(\\d+)") %>% 
  # file path by category by year, remove unnecessary columns
  spread(key, value) %>% select(-dir, -ext) %>% 
  # extract dataframes from environments, and join them
  mutate(household=map(household, env2listcol),
         person=map(person, env2listcol),
         joined=map2(household, person, left_join)) %>% 
  # rbind joined tables, although you could pull(joined) or imap over it
  unnest(joined)

#> # A tibble: 7 x 14
#>    year    id  var2  var3  var4  var5  var6  var7  var8  var9 var10 var11 var12 var13
#>   <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1  2010     1     1     1     a     d     1     1  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>
#> 2  2010     2     2     2     b     e     2     2  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>
#> 3  2010     3     3     3     c     f     3     3  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>
#> 4  2011     1  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>     1     1     1     1     1     1
#> 5  2011     2  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>     2     2     2     2     2     2
#> 6  2011     3  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>     3     3     3     3     3     3
#> 7  2011     4  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>     4     4     4     4     4     4

您决定要用它做什么。您可以将它写回R对象(请,请...请改用Rds)。你可以将它写回一个表(我相信它更容易处理)。您甚至可以将其导出为json。