Question

我有几个相同格式的文件，我需要根据这些数据框中的三列基于特定阈值进行过滤。最后我需要将它们保存为单独的结果

示例数据框如下所示，

    ID  Mean    log2FoldChange  SE  stat    pvalue  padj
0   ENSG2   0.737466    -0.434579   0.484389    -0.897170   0.369628    0.607709
1   ENSG32  321.467787  -0.405760   0.170955    -2.373484   0.017621    0.097636
2   ENSG85  0.000000    NaN NaN NaN NaN NaN

当我尝试运行以下函数时，我定义该函数用于从数据帧中过滤和提取子集并保存它

def DEfilter(df):
    Up_regulted    = df.query('log2FoldChange >= 0.58 and pvalue <= 0.05 and padj <= 0.05')
    Down_regulated = df.query('log2FoldChange <= -0.58 and pvalue <= 0.05 and padj <= 0.05')
    #Frames         = [Up_regulted,Down_regulated]
    DE             = pd.concat(Up_regulted,Down_regulated)
    return df

当我尝试将其应用于其中一个数据帧时，

Patient_pairs.apply(DEfilter,axis=1)

它让我跟着错误，

 AttributeError: ("'Series' object has no attribute 'query'", 'occurred at index 0')

到目前为止，我试图将过滤后的结果保存为新文件

     path       = '/home/pathtofile' 
        files      = os.listdir(path)

        results        = [os.path.join(path,i) for i in files if i.startswith('DE')]

    for filename in results:
        name       = os.path.basename(os.path.normpath(filename))
        df         = pd.read_csv(filename, sep=sep, header=0)
        Up         = df.query('log2FoldChange >= 0.58 and pvalue <= 0.05 and padj <= 0.05')
        Down       = df.query('log2FoldChange <= -0.58 and pvalue <= 0.05 and padj <= 0.05')   
        DE         = pd.concat(Up,Down)
        DE.to_csv('Filtered_set_' + name, sep='\t',index=False)

任何帮助/建议都会很棒

Answer 1

您正在尝试在系列级方法上运行数据帧级操作。不要在DataFrame.apply中传递函数（它在数据帧的行或列上应用函数）。只需按原样调用函数并将整个数据框作为参数传递：

path = '/home/pathtofile' 
files = os.listdir(path)
results = [os.path.join(path,i) for i in files if i.startswith('DE')]

def DEfilter(df):
    Up_regulted = df.query('log2FoldChange >= 0.58 and pvalue <= 0.05 and padj <= 0.05')
    Down_regulated = df.query('log2FoldChange <= -0.58 and pvalue <= 0.05 and padj <= 0.05')
    DE = pd.concat([Up_regulted, Down_regulated])
    return DE

for filename in results:
     df = pd.read_csv(filename, sep=sep, header=0)
     DE = DEfilter(df)

     name = os.path.basename(os.path.normpath(filename))
     DE.to_csv('Filtered_set_' + name, sep='\t',index=False)

从某些列过滤基于数据帧的值时的属性错误

1 个答案: