按行合并文件

时间:2016-02-15 12:12:14

标签: r merge

我想基于列合并文件。文件没有相似的行数。输出应包含所有行,如果它不存在于某个文件中,则计数应为0

我尝试了类似的事情:

 file_list <- list.files(pattern = "*.mature")

    > dataset_tumor <- do.call("cbind",lapply(file_list,
+ FUN=function(files){read.table(files,
+ header=TRUE, sep="")}))
Error in data.frame(..., check.names = FALSE) : 
  arguments imply differing number of rows: 497, 642, 692, 694, 699, 515, 707, 740, 605, 568, 602, 512, 624, 634, 551, 662, 750, 442, 615, 557, 466, 638, 560, 576, 851, 705, 614, 547, 670, 752, 586, 671, 754, 603, 666, 587, 601, 572, 550, 573, 621, 650, 701, 622, 735, 434, 742, 737, 809, 661, 540, 645, 722, 594, 681, 659, 781, 613, 641, 756, 595, 966, 658, 539, 520, 619, 564, 732, 679, 596, 536, 518, 631, 691, 708, 625, 630, 589, 639, 538


> head(a.mature)
                 X4
hsa-let-7a-5p 12342
hsa-let-7b-3p    27
hsa-let-7b-5p 47413
hsa-let-7c-5p  2825
hsa-let-7d-3p  1162
hsa-let-7d-5p   219
> head(b.mature)
                X15
hsa-let-7a-5p 28868
hsa-let-7b-3p    41
hsa-let-7b-5p 62259
hsa-let-7c-5p  4468
hsa-let-7k-3p  2027
hsa-let-7f-5p   938

out

               X4        X15
hsa-let-7a-5p  12342      28868
hsa-let-7b-3p  27         41
hsa-let-7b-5p  47413      62259
hsa-let-7c-5p  2825       4468
hsa-let-7d-3p  1162       0
hsa-let-7d-5p  219        0
hsa-let-7k-3p  0          2027
hsa-let-7f-5p  0          938

1 个答案:

答案 0 :(得分:0)

与包含primary keyforeign key的数据库一样,您需要在两个数据集之间使用公共列来组合两个数据集。从合并功能的例子

authors <- data.frame(
    surname = I(c("Tukey", "Venables", "Tierney", "Ripley", "McNeil")),
    nationality = c("US", "Australia", "US", "UK", "Australia"),
    deceased = c("yes", rep("no", 4)))

 books <- data.frame(
    name = I(c("Tukey", "Venables", "Tierney",
               "Ripley", "Ripley", "McNeil", "R Core")),
   title = c("Exploratory Data Analysis",
             "Modern Applied Statistics ...",
             "LISP-STAT",
             "Spatial Statistics", "Stochastic Simulation",
             "Interactive Data Analysis",
             "An Introduction to R"),
              other.author = c(NA, "Ripley", NA, NA, NA, NA,
              "Venables & Smith"))

此处我们有两个数据框,作者中的surname列与图书数据框中的name列相同。因此,我们可以使用这些字段合并数据集:

m1 <- merge(authors, books, by.x = "surname", by.y = "name")

如果您想将所有图书保留在合并的数据框中,您可以在合并功能中使用all.yall.x参数,无论您先保留哪个。

  m1 <- merge(authors, books, by.x = "surname", by.y = "name", all.y =TRUE)

OR

 m1 <- merge(books, authors,  by.x = "name",  by.y = "surname", all.x =TRUE)

同样,您也可以在join_all包中使用plyr函数,该函数可以合并两个以上的文件。