Question

我是大熊猫的新手，在解决以下问题时遇到一些麻烦。我有两个文件需要用来创建输出。第一个文件包含功能和相关基因的列表。文件的一个例子（显然有完整的数据）

File 1:

Function    Genes
Emotions    HAPPY,SAD,GOOFY,SILLY
Walking    LEG,MUSCLE,TENDON,BLOOD
Singing    VOCAL,NECK,BLOOD,HAPPY

我正在使用以下方式阅读字典：

from collections import *

FunctionsWithGenes = defaultdict(list)

def read_functions_file(File):
    Header = File.readline()
    Lines = File.readlines()
    for Line in Lines:
        Function, Genes = Line[0], Line[1] 
        FunctionsWithGenes[Function] = Genes.split(",") # the genes for each function are in the same row and separated by commas

第二个表包含我在.txt文件中需要的所有信息，其中包含一列基因例如：

chr    start    end    Gene    Value   MoreData
chr1    123    123    HAPPY    41.1    3.4
chr1    342    355    SAD    34.2    9.0
chr1    462    470    LEG    20.0    2.7

我在阅读时使用：

import pandas as pd 

df = pd.read_table(File)

数据框包含多个列，其中一列是＆＃34; Genes＆＃34;。该列可以包含可变数量的条目。我想通过＆＃34;功能＆＃34;分割数据帧。在FunctionsWithGenes字典中键入。到目前为止，我有：

df = df[df["Gene"].isin(FunctionsWithGenes.keys())] # to remove all rows with no matching entries

现在我需要以某种方式基于基因功能分割数据帧。我想也许想添加一个带有基因功能的新列，但不确定这是否有效，因为一些基因可以有多个功能。

Answer 1

我对你的最后一行代码感到有点困惑：

 df = df[df["Gene"].isin(FunctionsWithGenes.keys())]

因为FunctionsWithGenes的键是实际的函数（Emotions等...），但是基因列具有值。生成的DataFrame始终为空。

如果我理解正确的话，你想把表分开，以便属于一个函数的所有基因都在一个表中，如果是这样的话，你可以使用简单的字典理解，我设置一些类似的变量你的：

>>> for function, genes in FunctionsWithGenes.iteritems():
...     print function, genes
... 
Walking ['LEG', 'MUSCLE', 'TENDON', 'BLOOD']
Singing ['VOCAL', 'NECK', 'BLOOD', 'HAPPY']
Emotions ['HAPPY', 'SAD', 'GOOFY', 'SILLY']
>>> df
    Gene  Value
0  HAPPY   3.40
1    SAD   4.30
2    LEG   5.55

然后我像这样分开DataFrame：

>>> FunctionsWithDf = {function:df[df['Gene'].isin(genes)]
...     for function, genes in FunctionsWithGenes.iteritems()}

现在FunctionsWithDf是一个字典，它将Function映射到DataFrame，其中Gene列的值为FunctionsWithGenes[Function]

例如：

>>> FunctionsWithDf['Emotions']
    Gene  Value
0  HAPPY    3.4
1    SAD    4.3
>>> FunctionsWithDf['Singing']
    Gene  Value
0  HAPPY    3.4

熊猫：根据字典拆分和编辑文件

1 个答案: