合并几个不同列长度的数据帧和操纵列

时间:2011-05-26 14:26:16

标签: r dataframe histogram

我正在使用9个不同数据的文件(每个组织数据的蛋白质)。每个文件代表不同的组织并具有蛋白质表达的值(作为数字)。我试图将数据合并到一个data.frame。我用了

read.delim("fileName.txt")  

表示所有文件。之后,我使用了所有数据框的列表

l <- list(data.frame1,..etc)

然后我使用了plyr库和do.call(rbind.fill,l)

我的问题:

1)我希望遍历9个data.frames的列表,找到它们中的唯一数据并将其绘制在直方图中。如果我找到多个具有相同名称但不同组织的条目,则应将其添加到每个正确组织标签上方的直方图中。那就是 - 我转到列表中的第一个data.frame,从中取出第一个条目,搜索是否在其他data.frames中找到此条目,如果是,则将其添加到直方图。

直方图在x轴上有9个组织,y轴是我文件中的值。我无法弄清楚如何使直方图(和代码)适当地更改名称以及如何在正确的位置显示条形。

此外,我不知道如何构建轴以获取每个条形下的组织名称。

我有一些基本的代码没有做我想要的:

i=1

for( val in list2[1:9] )
{
    if( val appears in one of the other data.frames)
           plot a bar over the correct tissue.

    hist(val[i,8],breaks=11,col="blue",density=13,angle=45,
           labels=c("Lung","ErythroleukemicCellLine","TCells","Blood","liver",
           "BLimpho","pancreas","prostate","Bladder"), main=fileName[i,1])
    dev.new() #each hist in a new window
    i = i + 1

}
谢谢你 yigeal

这是代码输出结尾的几行: 用read.delim(“nameOfFile.txt”)

读取文件后
 dput(BloodErythroleukemicCellLineFile)
 "Tax_Id=9606 Gene_Symbol=ZNF589 Uncharacterized protein", 
    "Tax_Id=9606 Gene_Symbol=ZNF598 Isoform 1 of Zinc finger protein 598", 
    "Tax_Id=9606 Gene_Symbol=ZNF609 Zinc finger protein 609", 
    "Tax_Id=9606 Gene_Symbol=ZNF610 Isoform 1 of Zinc finger protein 610", 
    "Tax_Id=9606 Gene_Symbol=ZNF613 Isoform 1 of Zinc finger protein 613", 
    "Tax_Id=9606 Gene_Symbol=ZNF614 Zinc finger protein 614", 
    "Tax_Id=9606 Gene_Symbol=ZNF622 Zinc finger protein 622", 
    "Tax_Id=9606 Gene_Symbol=ZNF625 Zinc finger protein 625", 
    "Tax_Id=9606 Gene_Symbol=ZNF638 Isoform 1 of Zinc finger protein 638", 
    "Tax_Id=9606 Gene_Symbol=ZNF638 Isoform 4 of Zinc finger protein 638", 
    "Tax_Id=9606 Gene_Symbol=ZNF646 Isoform 1 of Zinc finger protein 646", 
    "Tax_Id=9606 Gene_Symbol=ZNF658B Zinc finger protein 658B", 
    "Tax_Id=9606 Gene_Symbol=ZNF667 Zinc finger protein 667, isoform CRA_a", 
    "Tax_Id=9606 Gene_Symbol=ZNF671 Zinc finger protein 671", 
    "Tax_Id=9606 Gene_Symbol=ZNF687 Isoform 1 of Zinc finger protein 687", 
    "Tax_Id=9606 Gene_Symbol=ZNF687 Zinc finger protein 687", 
    "Tax_Id=9606 Gene_Symbol=ZNF691 cDNA FLJ56317, highly similar to Zinc finger protein 691", 
    "Tax_Id=9606 Gene_Symbol=ZNF700 Zinc finger protein 700", 
    "Tax_Id=9606 Gene_Symbol=ZNF714 Isoform 1 of Zinc finger protein 714", 
    "Tax_Id=9606 Gene_Symbol=ZNF72 Zinc finger protein 72 (Fragment)", 
    "Tax_Id=9606 Gene_Symbol=ZNF721 zinc finger protein 721", 
    "Tax_Id=9606 Gene_Symbol=ZNF76 Isoform 2 of Zinc finger protein 76", 
    "Tax_Id=9606 Gene_Symbol=ZNF782 Zinc finger protein 782", 
    "Tax_Id=9606 Gene_Symbol=ZNF787 Zinc finger protein 787", 
    "Tax_Id=9606 Gene_Symbol=ZNF800 Zinc finger protein 800", 
    "Tax_Id=9606 Gene_Symbol=ZNF827 21 kDa protein", "Tax_Id=9606 Gene_Symbol=ZNF828 Zinc finger protein 828", 
    "Tax_Id=9606 Gene_Symbol=ZNF837 Zinc finger protein 837", 
    "Tax_Id=9606 Gene_Symbol=ZNF878 Zinc finger protein 878", 
    "Tax_Id=9606 Gene_Symbol=ZNF891 Zinc finger protein 891", 
    "Tax_Id=9606 Gene_Symbol=ZNHIT2 Zinc finger HIT domain-containing protein 2", 
    "Tax_Id=9606 Gene_Symbol=ZP2 Zona pellucida sperm-binding protein 2", 
    "Tax_Id=9606 Gene_Symbol=ZRANB2 Isoform 1 of Zinc finger Ran-binding domain-containing protein 2", 
    "Tax_Id=9606 Gene_Symbol=ZSWIM6 Zinc finger SWIM domain-containing protein 6", 
    "Tax_Id=9606 Gene_Symbol=ZUFSP 32 kDa protein", "Tax_Id=9606 Gene_Symbol=ZW10 Centromere/kinetochore protein zw10 homolog", 
    "Tax_Id=9606 Gene_Symbol=ZWINT ZW10 interactor", "Tax_Id=9606 Gene_Symbol=ZYG11B Isoform 1 of Protein zyg-11 homolog B", 
    "Tax_Id=9606 Gene_Symbol=ZYX cDNA FLJ53160, highly similar to Zyxin", 
    "Tax_Id=9606 Gene_Symbol=ZYX Uncharacterized protein", "Tax_Id=9606 Gene_Symbol=ZYX Zyxin"
    ), class = "factor")), .Names = c("proteinIdentifier", "protein", 
"spectra", "unique_peptides", "FDR", "local_FDR", "sequence_coverage", 
"expression_value", "expression_percentile", "organism", "tissue", 
"localization", "condition", "experiment", "annotation"), class = "data.frame", row.names = c(NA, 
-4802L))

控制台中的时间要长得多

1 个答案:

答案 0 :(得分:1)

在你的问题中找到问题的核心并不容易。 对于使用某些公共字段(或字段)合并数据帧,您可以使用merge()函数,如:

merge(dataframe1, dataframe2, by=c('column_name1','column_name2'), suffixes=c('.from_df1','.from_df2'))

如果要选择行或列,可以这样做:

dataframe1[dataframe$column1 == 'some_value", c('col1', 'col2')]

...等 这对你有帮助吗?