根据列(字符串)对大熊猫中的CSV排序

时间:2019-01-31 14:30:14

标签: python pandas sorting

我正在对一列csv wrt进行排序,但是现在此字符串变得越来越复杂,不确定如何对它进行排序

为什么仍然坚持使用熊猫就像我已经将排序后的值写回csv

CSV
Snapshot,Status
21.001.1154_2019-01-04_14-37-47_1280868,Released
21.001.1183_2019-01-04_16-37-47_1280868,Unit Tested
21.001.1183_2019-01-04_14-37-47_1280868,Release

I used:
dd.sort_values(['Snapshot'],ascending=True)
du.to_csv(unit_file,header =True,index=False)

dataframe:
C:\Users\320047585\Sathish\Python>python sample.py
Before Sort
                              Snapshot       Status
0  21.001.1154_2019-01-04_14-37-47_1280868     Released
1  21.001.1183_2019-01-04_16-37-47_1280868  Unit Tested
2  21.001.1183_2019-01-04_14-37-47_1280868      Release

然后返回返回排序的值,首先_,但是现在,如果两个ID相同,我需要检查日期,甚至日期相同,我都需要按时排序,那么任何见解都将大有帮助

Expected output
21.001.1154_2019-01-04_14-37-47_1280868,Released
21.001.1183_2019-01-04_14-37-47_1280868,Released
21.001.1183_2019-01-04_16-37-47_1280868,Unit Tested

预先感谢

2 个答案:

答案 0 :(得分:1)

使用s.str.split()来获取 to_be_sorted 值,其后跟df.reindex()

df_new=df.reindex(df.Snapshot.str.split("_").str[2].sort_values().index)
print(df_new)

                                  Snapshot       Status
0  21.001.1154_2019-01-04_14-37-47_1280868     Released
2  21.001.1183_2019-01-04_14-37-47_1280868     Released
1  21.001.1183_2019-01-04_16-37-47_1280868  Unit Tested

如果您需要同时考虑日期和时间,请使用:

data_new = data.join(data.Snapshot.str.split("_",expand=True)).sort_values(by=[0,1,2])
print(data_new)

                                 Snapshot       Status           1         2  \
0  21.001.1154_2019-01-04_14-37-47_1280868     Released  2019-01-04  14-37-47   
2  21.001.1183_2019-01-04_14-37-47_1280868     Released  2019-01-04  14-37-47   
1  21.001.1183_2019-01-04_16-37-47_1280868  Unit Tested  2019-01-04  16-37-47   

         3  
0  1280868  
2  1280868  
1  1280868  

当然,您可以删除不需要的列。

答案 1 :(得分:1)

由于必须对整个字符串进行排序,因此我对anky的答案进行了较小的更改

Before
df_new = df.join(df.Snapshot.str.split("_",expand=True).drop(0,1)).sort_values(by=[1,2])

After
data_new = data.join(data.Snapshot.str.split("_",expand=True)).sort_values(by=[0,1,2])

考虑整个字符串

更有趣

data.sort_values(['Snapshot'],ascending=True) 
Also doing the perfect sorting..! it ignores underscores and dots