在r中合并而不编写多个合并语句

时间:2016-08-08 12:05:43

标签: r merge

我有1个数据集,它是基础&还有7个不同年份的7个其他数据集&对于3个不同的区域。这些数据集包括数量,区域和数量。基础数据共有的年份。

但是,我需要将7个数据集1合并到基础数据集中。如何实现?

基础数据集:

company_region  raised_amount_usd   Year
 SF Bay Area    1000050 2011
 SF Bay Area    2520000 2011
 SF Bay Area    15000   2010
 Singapore  615000  2011

2007年:

 raised_amount_usd  z   e   Year company_region 
1.00E+06    5   0          2007  Singapore
8.00E+06    6   1          2007  Singapore

50000 3 0 2007新加坡    35000 3 0 2007新加坡

&安培;同样,我有2008 - 2012年其他年份的数据。我需要列z& e在我的基础数据集中。而不是编写7个合并语句,如何通过函数完成?

如果有人可以提供帮助,那会很棒。谢谢你提前!!

1 个答案:

答案 0 :(得分:0)

如果你想保留列z和e,dplyr包中的bind_rows()似乎就是答案(另见Combine two data frames by rows (rbind) when they have different sets of columns

# Create example
a <- c(rep("SF Bay Area",3),"Singapore")
b <- c(1000050,2520000,15000,615000)
c <- c(2011,2010,2011,2011)
base <- cbind.data.frame(a,b,c,stringsAsFactors =F)
colnames(base) <- c("company_region","raised_amount_usd","Year")


a <- c(rep("Germany",4))
b <- c(100055,2524400,150020,68880)
c <- c(2007,2007,2007,2007)
e <- c(1,1,1,1)
z <- c(1,1,1,1)
data_germany <- cbind.data.frame(a,b,c,e,z,stringsAsFactors =F)
colnames(data_germany) <- c("company_region","raised_amount_usd","Year","e","z")

a <- c(rep("Italy",4))
b <- c(100055,2524400,150020,68880)
c <- c(2007,2007,2007,2007)
e <- c(1,1,1,1)
z <- c(1,1,1,1)
data_italy <- cbind.data.frame(a,b,c,e,z,stringsAsFactors =F)
colnames(data_italy) <- c("company_region","raised_amount_usd","Year","e","z")

# bin german and italian data at once with dplyr
library(dplyr)
base %>% 
  bind_rows(data_germany) %>% 
  bind_rows(data_italy) -> base

如果你不想保留z和e,你可以这样做:

# Function to extent base dataframe
# base_df = base dataframe to extent
# add_df = dataframe that should be added to the base dataframe
fun_extent_data <- function(base_df,add_df) {

  library(dplyr)
  base_df <- base_df
  add_df <- add_df

  # Choose all necessary columns
  add_df %>%
    select(company_region,raised_amount_usd,Year) -> add_df_light

  # Bind the data to the base dataframe
  rbind.data.frame(base_df,add_df_light,stringsAsFactors = FALSE) -> base_df

  return(base_df)
}

# Use function 
fun_extent_data(base,data_germany) -> base

# Use function for german and italian data at once with dplyr
library(dplyr)
base %>%
  fun_extent_data(.,data_germany) %>%
  fun_extent_data(.,data_italy) -> base