我的目标是通过文件名中的共享特征将目录中的.csv文件分组。我的目录包含名称如下的文件:
我想将这些文件按文件名的“源”和“接收器”部分后面的数字(如下所示)分组,以便以后将它们连接起来。
第1组
第2组
有什么想法吗?
答案 0 :(得分:1)
它说您想在pandas
中执行此操作,因此这是一个pandas
解决方案。
fnames = ['After_Source1_Receiver1.csv',
'After_Source1_Receiver2.csv',
'Before_Source1_Receiver1.csv',
'Before_Source1_Receiver2.csv',
'During1_Source1_Receiver1.csv',
'During1_Source1_Receiver2.csv',
'During2_Source1_Receiver1.csv',
'During2_Source1_Receiver2.csv']
df = pd.DataFrame(fnames, columns=['names'])
我不知道您要如何处理最终结果,但这是将它们分组的方式。
pattern = r'Source(\d+)_Receiver(\d+)'
for _, g in pd.concat([df, df['names'].str.extract(pattern)], axis=1).groupby([0,1]):
print(g.names)
0 After_Source1_Receiver1.csv
2 Before_Source1_Receiver1.csv
4 During1_Source1_Receiver1.csv
6 During2_Source1_Receiver1.csv
Name: names, dtype: object
1 After_Source1_Receiver2.csv
3 Before_Source1_Receiver2.csv
5 During1_Source1_Receiver2.csv
7 During2_Source1_Receiver2.csv
Name: names, dtype: object