我有2个数据帧df和df1,并且都具有这样的文件路径。
df = pd.DataFrame({"X1": ['f','f','o','o','b','b'],
"X2": ['fb/FOO1/bar0.wav','fb/FOO1/bar1.wav','fb/FOO2/bar2.wav','fb/FOO2/bar3.wav','fb/FOO3/bar4.wav','fb/FOO3/bar5.wav']})
X1 X2
0 f fb/FOO1/bar0.wav
1 f fb/FOO1/bar1.wav
2 o fb/FOO2/bar2.wav
3 o fb/FOO2/bar3.wav
4 b fb/FOO3/bar4.wav
5 b fb/FOO3/bar5.wav
和另一个数据框
df1 = pd.DataFrame({"X1": ['b','o','b','f','o','f'],
"X2": ['fb1/FOO3/bar5.opus','fb1/FOO2/bar2.opus','fb1/FOO3/bar4.opus','fb1/FOO1/bar1.opus','fb1/FOO2/bar3.opus','fb1/FOO1/bar0.opus']})
X1 X2
0 b fb1/FOO3/bar5.opus
1 o fb1/FOO2/bar2.opus
2 b fb1/FOO3/bar4.opus
3 f fb1/FOO1/bar1.opus
4 o fb1/FOO2/bar3.opus
5 f fb1/FOO1/bar0.opus
现在,我想根据第一个数据帧df的文件路径对第二个数据帧df1的X2列(文件路径)进行排序。这样,输出应该像这样
X1 X2
0 f fb1/FOO1/bar0.opus
1 f fb1/FOO1/bar1.opus
2 o fb1/FOO2/bar2.opus
3 o fb1/FOO2/bar3.opus
4 b fb1/FOO3/bar4.opus
5 b fb1/FOO3/bar5.opus
答案 0 :(得分:1)
您可以创建一个排序字典,使您可以使用自定义键对值进行排序:
#the following is creating a key with the name part of the filepath (could have been done with regex)
sorter_dict = dict(zip(df.X2.apply(lambda x : x.split('/')[-1].split('.')[0]),df.index))
#{'bar0': 0, 'bar1': 1, 'bar2': 2, 'bar3': 3, 'bar4': 4, 'bar5': 5}
#on df1, let's create a temp col with the name part of the filepath
df1['temp'] = df1.X2.apply(lambda x : x.split('/')[-1].split('.')[0])
#and apply our sorter dict
df1['sorter'] = df1.temp.map(sorter_dict)
#at the end, simply sort
df1 = df1.sort_values('sorter')
#and delete unecessary cols
del df1['temp'], df1['sorter']
输出
| X1 | X2 |
|:-----|:-------------------|
| f | fb1/FOO1/bar0.opus |
| f | fb1/FOO1/bar1.opus |
| o | fb1/FOO2/bar2.opus |
| o | fb1/FOO2/bar3.opus |
| b | fb1/FOO3/bar4.opus |
| b | fb1/FOO3/bar5.opus |
答案 1 :(得分:1)
如果文件路径名在数据帧内的长度一致,这可能会起作用。只需使用要排序的部分创建一个新列,然后对该列进行排序,然后删除新列:
df['X3'] = df['X2'].astype(str).str[3:-4]
df1['X3'] = df1['X2'].astype(str).str[4:-5]
df1 = df1.set_index('X3')
df1 = df1.reindex(index=df['X3'])
df1 = df1.reset_index()
df1 = df1.drop('X3', axis = 1)
df = df.drop('X3', axis = 1)
df1