想象一下,我有100个csv文件,所有文件在第一列中都有相同的主题ID,但其他各列却完全不同。我想有一个Excel文件的第一列中有ID,其他列是所有其他csv文件中的所有列。我怎样才能做到这一点?我不能将它们中的每两个合并,也不能将它们合并到R中的另一个。这会令人沮丧。
考虑:
file1.csv with 3 columns "subjectID","a","b"
file2.csv with 3 columns "subjectID", "c","d"
file3.csv with 2 columns "subjectID", "e"
最后,我想拥有一个包含6列的csv文件:
"subjectID","a","b","c","d","e"
答案 0 :(得分:2)
您可以执行以下操作
# Read in files
#lst <- lapply(files, read.csv)
# Generate similar sample data to demonstrate
lst <- list(
data.frame(subjectID = letters[1:10], a = runif(10), b = runif(10)),
data.frame(subjectID = letters[1:10], c = runif(10), d = runif(10)),
data.frame(subjectID = letters[1:10], e = runif(10), f = runif(10)),
data.frame(subjectID = letters[1:10], g = runif(10), h = runif(10)))
# Merge data from all files on subjectID
Reduce(function(x, y) merge(x, y, by = "subjectID"), lst)
# subjectID a b c d e f
#1 a 0.3303817 0.297198993 0.9521621 0.07472854 0.8422689 0.642384618
#2 b 0.4693850 0.029617471 0.1079085 0.97297463 0.8047761 0.002465216
#3 c 0.1232060 0.351755203 0.4649148 0.97412774 0.3047000 0.290868067
#4 d 0.7906051 0.402014018 0.7141169 0.69951165 0.4372228 0.142227230
#5 e 0.3958683 0.119870791 0.1061828 0.07939243 0.5506707 0.276125793
#6 f 0.8460007 0.032571856 0.4205542 0.03433463 0.4095929 0.561597813
#7 g 0.3087469 0.002836689 0.6625422 0.43830865 0.5944669 0.186904600
#8 h 0.3501046 0.599942351 0.2073871 0.11963722 0.7769929 0.367783960
#9 i 0.7952080 0.400595114 0.9792009 0.30959206 0.5644129 0.122465491
#10 j 0.3829504 0.972797955 0.9483458 0.93079712 0.2273367 0.726364011
# g h
#1 0.3224803 0.09905568
#2 0.9986640 0.42053490
#3 0.5484119 0.88754806
#4 0.3274199 0.87417816
#5 0.9474794 0.40207119
#6 0.3864848 0.97977549
#7 0.4875860 0.31788236
#8 0.5094075 0.86424560
#9 0.3900625 0.11860494
#10 0.7064986 0.11939311
请注意,我已经生成了list
个(共4个data.frame
)作为样本数据;所有data.frame
共享一个公用列{{1}}。就您而言,您将使用例如subjectID
基于read.csv
中给出的文件名。