Question

我每年都有几个CSV个文件。每个文件包含相同的变量和观察值。

df14 <- data.frame(name = c("one", "two", "three"), A = c(1,2,3), B = c(4, 2, 1), C = c(0, 1, 1))
df15 <- data.frame(name = c("one", "two", "three"), A = c(3,1,1), C = c(0, 0, 1), B = c(8, 5, 5))

假设df14＆amp; df15代表2014年＆amp;分别于2015年。

注意：变量不按相同顺序记录。

我想要做的是查看每个变量（A，B，C）每年name的变化情况。

有没有办法在一个数据框中组合这些？我应该只是rbind吗？

更新

我能做的一件事是将年份作为一个新变量和rbind分配，但这是一个好习惯吗？

df14$year <- 2014; df15$year <- 2015
df <- rbind(df14, df15)

给出：

   name A B C year
   one 1 4 0 2014
   two 2 2 1 2014
   three 3 1 1 2014
   one 3 8 0 2015
   two 1 5 0 2015
   three 1 5 1 2015

Answer 1

尝试：

library(data.table)
library(magrittr)
years_2_digt <- 14:15

DT <- 
rbindlist(lapply(years_2_digt, function(y) {
  get(paste0("df", y)) %>% 
  setDT %>% 
  .[, year := y] %>%
  setkeyv("name")
}))


DT.molt <- reshape2::melt(DT, id.vars=c("name", "year"))

library(ggplot2)
ggplot(data=DT.molt, aes(x=year, color=variable, y=value)) + 
    geom_line() + geom_point() + 
    facet_grid(name ~ .) + 
    ggtitle("Change by year and name")

enter image description here

Answer 2

您可以以编程方式将年份列添加到每个数据框，然后再绑定它们。这是一个依赖于能够从文件名中获取与每个数据框相对应的年份的示例。在这里，我已经将样本数据帧存储在列表中。在您的实际用例中，您将使用df.list = sapply(vector_of_file_names, read.csv)之类的内容将csv文件读入列表。

df.list = list(df14=df14, df15=df15)

df.list = lapply(1:length(df.list), function(i) {
  df.list[[i]] = data.frame(df.list[[i]], 
                            year = 2000 + as.numeric(gsub(".*(\\d{2})\\.csv","\\1", names(df.list)[[i]])))
})

df = do.call(rbind, df.list)

Answer 3

以下是一个lapply内的工作示例：

制作一些虚拟CSV文件：

df14 <- data.frame(name = c("one", "two", "three"), A = c(1,2,3), B = c(4, 2, 1), C = c(0, 1, 1))
df15 <- data.frame(name = c("one", "two", "three"), A = c(3,1,1), C = c(0, 0, 1), B = c(8, 5, 5))
df16 <- data.frame(name = c("one", "two", "three"), C = c(1,2,3), B = c(4, 2, 1), A = c(0, 1, 1))
df17 <- data.frame(name = c("one", "two", "three"), C = c(3,1,1), A = c(0, 0, 1), B = c(8, 5, 5))
#get dataframe names
myNames <- ls()[grepl("df",ls())]
lapply(myNames, function(i){write.csv(get(i),paste0(i,".csv"),row.names = FALSE)})

解决方案：阅读CSV个文件，使用sort修复列，然后rbind将它们放入一个数据框中：

#Solution - read CSV, fix columns, rbind
do.call(rbind,
        lapply(list.files(".","^df\\d*.csv"),
               function(i){
                 d <- read.csv(i)
                 res <- d[,sort(colnames(d))]
                 cbind(res,FileName=i)
               }))
# output
#    A B C  name FileName
# 1  1 4 0   one df14.csv
# 2  2 2 1   two df14.csv
# 3  3 1 1 three df14.csv
# 4  3 8 0   one df15.csv
# 5  1 5 0   two df15.csv
# 6  1 5 1 three df15.csv
# 7  0 4 1   one df16.csv
# 8  1 2 2   two df16.csv
# 9  1 1 3 three df16.csv
# 10 0 8 3   one df17.csv
# 11 0 5 1   two df17.csv
# 12 1 5 1 three df17.csv

合并具有相同变量和观察值的多个数据帧

3 个答案: