Python Pandas:按相似的文件名对目录中的文件进行分组,并按特定顺序连接数据框

时间:2019-02-15 19:08:20

标签: python pandas file csv sorting

我的目标是通过文件名中的共享特征将目录中的.csv文件分组。我的目录包含名称如下的文件:

  • After_Source1_Receiver1.csv
  • After_Source1_Receiver2.csv
  • Before_Source1_Receiver1.csv
  • Before_Source1_Receiver2.csv
  • During1_Source1_Receiver1.csv
  • During1_Source1_Receiver2.csv
  • During2_Source1_Receiver1.csv
  • During2_Source1_Receiver2.csv

我想将这些文件按文件名的“源”和“接收器”部分后面的数字(如下所示)分组,以便以后将它们连接起来。

第1组

  • Before_Source1_Receiver1.csv
  • During1_Source1_Receiver1.csv
  • During2_Source1_Receiver1.csv
  • After_Source1_Receiver1.csv

第2组

  • Before_Source1_Receiver2.csv
  • During1_Source1_Receiver2.csv
  • During2_Source1_Receiver2.csv
  • After_Source1_Receiver2.csv

有什么想法吗?

1 个答案:

答案 0 :(得分:1)

它说您想在pandas中执行此操作,因此这是一个pandas解决方案。

fnames = ['After_Source1_Receiver1.csv',
          'After_Source1_Receiver2.csv',
          'Before_Source1_Receiver1.csv',
          'Before_Source1_Receiver2.csv',
          'During1_Source1_Receiver1.csv',
          'During1_Source1_Receiver2.csv',
          'During2_Source1_Receiver1.csv',
          'During2_Source1_Receiver2.csv']

df = pd.DataFrame(fnames, columns=['names'])

我不知道您要如何处理最终结果,但这是将它们分组的方式。

pattern = r'Source(\d+)_Receiver(\d+)'
for _, g in pd.concat([df, df['names'].str.extract(pattern)], axis=1).groupby([0,1]):
    print(g.names)

0      After_Source1_Receiver1.csv
2     Before_Source1_Receiver1.csv
4    During1_Source1_Receiver1.csv
6    During2_Source1_Receiver1.csv
Name: names, dtype: object
1      After_Source1_Receiver2.csv
3     Before_Source1_Receiver2.csv
5    During1_Source1_Receiver2.csv
7    During2_Source1_Receiver2.csv
Name: names, dtype: object