如何合并具有共同“主题ID”列和其他列中许多不同变量的多个CSV文件? IN R

时间:2018-08-08 23:43:09

标签: r csv merge

想象一下,我有100个csv文件,所有文件在第一列中都有相同的主题ID,但其他各列却完全不同。我想有一个Excel文件的第一列中有ID,其他列是所有其他csv文件中的所有列。我怎样才能做到这一点?我不能将它们中的每两个合并,也不能将它们合并到R中的另一个。这会令人沮丧。

考虑:

file1.csv with 3 columns "subjectID","a","b"
file2.csv with 3 columns "subjectID", "c","d"
file3.csv with 2 columns "subjectID", "e"

最后,我想拥有一个包含6列的csv文件:

"subjectID","a","b","c","d","e" 

1 个答案:

答案 0 :(得分:2)

您可以执行以下操作

# Read in files 
#lst <- lapply(files, read.csv)

# Generate similar sample data to demonstrate
lst <- list(
    data.frame(subjectID = letters[1:10], a = runif(10), b = runif(10)),
    data.frame(subjectID = letters[1:10], c = runif(10), d = runif(10)),
    data.frame(subjectID = letters[1:10], e = runif(10), f = runif(10)),
    data.frame(subjectID = letters[1:10], g = runif(10), h = runif(10)))

# Merge data from all files on subjectID
Reduce(function(x, y) merge(x, y, by = "subjectID"), lst)
#   subjectID         a           b         c          d         e           f
#1          a 0.3303817 0.297198993 0.9521621 0.07472854 0.8422689 0.642384618
#2          b 0.4693850 0.029617471 0.1079085 0.97297463 0.8047761 0.002465216
#3          c 0.1232060 0.351755203 0.4649148 0.97412774 0.3047000 0.290868067
#4          d 0.7906051 0.402014018 0.7141169 0.69951165 0.4372228 0.142227230
#5          e 0.3958683 0.119870791 0.1061828 0.07939243 0.5506707 0.276125793
#6          f 0.8460007 0.032571856 0.4205542 0.03433463 0.4095929 0.561597813
#7          g 0.3087469 0.002836689 0.6625422 0.43830865 0.5944669 0.186904600
#8          h 0.3501046 0.599942351 0.2073871 0.11963722 0.7769929 0.367783960
#9          i 0.7952080 0.400595114 0.9792009 0.30959206 0.5644129 0.122465491
#10         j 0.3829504 0.972797955 0.9483458 0.93079712 0.2273367 0.726364011
#           g          h
#1  0.3224803 0.09905568
#2  0.9986640 0.42053490
#3  0.5484119 0.88754806
#4  0.3274199 0.87417816
#5  0.9474794 0.40207119
#6  0.3864848 0.97977549
#7  0.4875860 0.31788236
#8  0.5094075 0.86424560
#9  0.3900625 0.11860494
#10 0.7064986 0.11939311

请注意,我已经生成了list个(共4个data.frame)作为样本数据;所有data.frame共享一个公用列{​​{1}}。就您而言,您将使用例如subjectID基于read.csv中给出的文件名。