我有1个数据集,它是基础&还有7个不同年份的7个其他数据集&对于3个不同的区域。这些数据集包括数量,区域和数量。基础数据共有的年份。
但是,我需要将7个数据集1合并到基础数据集中。如何实现?
基础数据集:
company_region raised_amount_usd Year
SF Bay Area 1000050 2011
SF Bay Area 2520000 2011
SF Bay Area 15000 2010
Singapore 615000 2011
2007年:
raised_amount_usd z e Year company_region
1.00E+06 5 0 2007 Singapore
8.00E+06 6 1 2007 Singapore
50000 3 0 2007新加坡 35000 3 0 2007新加坡
&安培;同样,我有2008 - 2012年其他年份的数据。我需要列z& e在我的基础数据集中。而不是编写7个合并语句,如何通过函数完成?
如果有人可以提供帮助,那会很棒。谢谢你提前!!
答案 0 :(得分:0)
如果你想保留列z和e,dplyr包中的bind_rows()似乎就是答案(另见Combine two data frames by rows (rbind) when they have different sets of columns)
# Create example
a <- c(rep("SF Bay Area",3),"Singapore")
b <- c(1000050,2520000,15000,615000)
c <- c(2011,2010,2011,2011)
base <- cbind.data.frame(a,b,c,stringsAsFactors =F)
colnames(base) <- c("company_region","raised_amount_usd","Year")
a <- c(rep("Germany",4))
b <- c(100055,2524400,150020,68880)
c <- c(2007,2007,2007,2007)
e <- c(1,1,1,1)
z <- c(1,1,1,1)
data_germany <- cbind.data.frame(a,b,c,e,z,stringsAsFactors =F)
colnames(data_germany) <- c("company_region","raised_amount_usd","Year","e","z")
a <- c(rep("Italy",4))
b <- c(100055,2524400,150020,68880)
c <- c(2007,2007,2007,2007)
e <- c(1,1,1,1)
z <- c(1,1,1,1)
data_italy <- cbind.data.frame(a,b,c,e,z,stringsAsFactors =F)
colnames(data_italy) <- c("company_region","raised_amount_usd","Year","e","z")
# bin german and italian data at once with dplyr
library(dplyr)
base %>%
bind_rows(data_germany) %>%
bind_rows(data_italy) -> base
如果你不想保留z和e,你可以这样做:
# Function to extent base dataframe
# base_df = base dataframe to extent
# add_df = dataframe that should be added to the base dataframe
fun_extent_data <- function(base_df,add_df) {
library(dplyr)
base_df <- base_df
add_df <- add_df
# Choose all necessary columns
add_df %>%
select(company_region,raised_amount_usd,Year) -> add_df_light
# Bind the data to the base dataframe
rbind.data.frame(base_df,add_df_light,stringsAsFactors = FALSE) -> base_df
return(base_df)
}
# Use function
fun_extent_data(base,data_germany) -> base
# Use function for german and italian data at once with dplyr
library(dplyr)
base %>%
fun_extent_data(.,data_germany) %>%
fun_extent_data(.,data_italy) -> base