我有几个相同格式的文件,我需要根据这些数据框中的三列基于特定阈值进行过滤。最后我需要将它们保存为单独的结果
示例数据框如下所示,
ID Mean log2FoldChange SE stat pvalue padj
0 ENSG2 0.737466 -0.434579 0.484389 -0.897170 0.369628 0.607709
1 ENSG32 321.467787 -0.405760 0.170955 -2.373484 0.017621 0.097636
2 ENSG85 0.000000 NaN NaN NaN NaN NaN
当我尝试运行以下函数时,我定义该函数用于从数据帧中过滤和提取子集并保存它
def DEfilter(df):
Up_regulted = df.query('log2FoldChange >= 0.58 and pvalue <= 0.05 and padj <= 0.05')
Down_regulated = df.query('log2FoldChange <= -0.58 and pvalue <= 0.05 and padj <= 0.05')
#Frames = [Up_regulted,Down_regulated]
DE = pd.concat(Up_regulted,Down_regulated)
return df
当我尝试将其应用于其中一个数据帧时,
Patient_pairs.apply(DEfilter,axis=1)
它让我跟着错误,
AttributeError: ("'Series' object has no attribute 'query'", 'occurred at index 0')
到目前为止,我试图将过滤后的结果保存为新文件
path = '/home/pathtofile'
files = os.listdir(path)
results = [os.path.join(path,i) for i in files if i.startswith('DE')]
for filename in results:
name = os.path.basename(os.path.normpath(filename))
df = pd.read_csv(filename, sep=sep, header=0)
Up = df.query('log2FoldChange >= 0.58 and pvalue <= 0.05 and padj <= 0.05')
Down = df.query('log2FoldChange <= -0.58 and pvalue <= 0.05 and padj <= 0.05')
DE = pd.concat(Up,Down)
DE.to_csv('Filtered_set_' + name, sep='\t',index=False)
任何帮助/建议都会很棒
答案 0 :(得分:2)
您正在尝试在系列级方法上运行数据帧级操作。不要在DataFrame.apply中传递函数(它在数据帧的行或列上应用函数)。只需按原样调用函数并将整个数据框作为参数传递:
path = '/home/pathtofile'
files = os.listdir(path)
results = [os.path.join(path,i) for i in files if i.startswith('DE')]
def DEfilter(df):
Up_regulted = df.query('log2FoldChange >= 0.58 and pvalue <= 0.05 and padj <= 0.05')
Down_regulated = df.query('log2FoldChange <= -0.58 and pvalue <= 0.05 and padj <= 0.05')
DE = pd.concat([Up_regulted, Down_regulated])
return DE
for filename in results:
df = pd.read_csv(filename, sep=sep, header=0)
DE = DEfilter(df)
name = os.path.basename(os.path.normpath(filename))
DE.to_csv('Filtered_set_' + name, sep='\t',index=False)