蟒蛇;将数据帧输出写入不同的子目录

时间:2017-05-03 19:51:51

标签: python pandas dataframe

我正从当前的工作目录运行我的脚本。使用我的脚本,我遍历当前工作目录的子目录。每个子目录包含脚本中提到的3个文件,对于每个子目录,我将3个文件合并为一个数据帧。正如我的脚本现在,它只将一个子目录的合并数据帧写入当前工作目录。我想要的是具有保存在该子目录中的每个子目录的合并数据帧的csv文件,或者具有连接到一个大输出文件的每个子目录的数据帧的文件。 使用我的脚本,我只在输出文件中输出一个子目录。

我的脚本如下:

print('Start merging contig files')

for root, dirs, files in os.walk(os.getcwd()):
    filepath = os.path.join(root, 'genes.faa.genespercontig.csv')
    if os.path.isfile(filepath):
        with open(filepath, 'r') as f1:
            df1 = pd.read_csv(f1, header=None, delim_whitespace=True, names = ["contig", "genes"])
            df1['genome'] = os.path.basename(os.path.dirname(filepath))

    filepath = os.path.join(root, 'hmmer.analyze.txt.results.txt')
    if os.path.isfile(filepath):
        with open(filepath, 'r') as f2:
            df2 = pd.read_csv(f2, header=None, delim_whitespace=True, names = ["contig", "SCM"])
            df2['genome'] = os.path.basename(os.path.dirname(filepath))

    filepath = os.path.join(root, 'genes.fna.output_blastplasmiddb.out.count_plasmiddbhit.out')
    if os.path.isfile(filepath):
        with open(filepath, 'r') as f3:
            df3 = pd.read_csv(f3, header=None, delim_whitespace=True, names = ["contig", "plasmid_genes"])
            df3['genome'] = os.path.basename(os.path.dirname(filepath))

#merge dataframes
dfmerge1 = pd.merge(df1, df2, on=['genome', 'contig'], how='outer')
df_end = pd.merge(dfmerge1, df3, on=['genome', 'contig'], how='outer')      

df_end.to_csv('outputgenesdf.csv')

2 个答案:

答案 0 :(得分:1)

只需添加to_csv()

的路径即可
df_end.to_csv('your/path/here/outputgenesdf.csv')

答案 1 :(得分:1)

试试这个:

df_end.to_csv(os.path.join(root, 'outputgenesdf.csv'))

PS确保此命令位于for loop

print('Start merging contig files')

for root, dirs, files in os.walk(os.getcwd()):
    filepath = os.path.join(root, 'genes.faa.genespercontig.csv')
    if os.path.isfile(filepath):
        with open(filepath, 'r') as f1:
            df1 = pd.read_csv(f1, header=None, delim_whitespace=True, names = ["contig", "genes"])
            df1['genome'] = os.path.basename(os.path.dirname(filepath))

    filepath = os.path.join(root, 'hmmer.analyze.txt.results.txt')
    if os.path.isfile(filepath):
        with open(filepath, 'r') as f2:
            df2 = pd.read_csv(f2, header=None, delim_whitespace=True, names = ["contig", "SCM"])
            df2['genome'] = os.path.basename(os.path.dirname(filepath))

    filepath = os.path.join(root, 'genes.fna.output_blastplasmiddb.out.count_plasmiddbhit.out')
    if os.path.isfile(filepath):
        with open(filepath, 'r') as f3:
            df3 = pd.read_csv(f3, header=None, delim_whitespace=True, names = ["contig", "plasmid_genes"])
            df3['genome'] = os.path.basename(os.path.dirname(filepath))

    #merge dataframes
    dfmerge1 = pd.merge(df1, df2, on=['genome', 'contig'], how='outer')
    df_end = pd.merge(dfmerge1, df3, on=['genome', 'contig'], how='outer')      

    df_end.to_csv(os.path.join(root, 'outputgenesdf.csv'))