我刚刚发现了大熊猫的力量。 (谢谢Wes McKinney!)我有一个包含以下信息的csv:
RUN_START_DATE,PUSHUP_START_DATE,SITUP_START_DATE,PULLUP_START_DATE
2013-01-24,2013-01-02,2013-01-30,2013-02-03
2013-01-30,2013-01-21,2013-01-13,2013-01-06
2013-01-29,2013-01-28,2013-01-01,2013-01-29
2013-02-16,2013-02-12,2013-01-04,2013-02-11
2013-01-06,2013-02-07,2013-02-25,2013-02-12
2013-01-26,2013-01-28,2013-02-12,2013-01-10
2013-01-26,2013-02-10,2013-01-12,2013-01-30
2013-01-03,2013-01-24,2013-01-19,2013-01-02
2013-01-22,2013-01-13,2013-02-03,2013-02-05
2013-02-06,2013-01-16,2013-02-07,2013-01-11
通常,我不会在此过程中使用pandas。我使用csv库生成列表。使用日期时间库转换它们。然后我遍历每一行并运行类似下面的内容来获取每行的排序索引:
'"' + ','.join(map(str, sorted(range(len(dates)), key=lambda k: dates[k]))) + '"'
然后为每行返回类似的内容:
Out[40]: '"1,0,2,3"'
然后我在每行的末尾添加它作为我的csv中的新字段。
我可以将csv读入pandas并将项目转换为日期dtype。我只是不确定如何使用pandas获取排序的索引值,然后将它们展平为字符串并将它们放入列中?任何帮助最受赞赏!
答案 0 :(得分:7)
您可以使用numpy.argsort()
获取排序索引:
from StringIO import StringIO
import numpy as np
import pandas as pd
txt = """RUN_START_DATE,PUSHUP_START_DATE,SITUP_START_DATE,PULLUP_START_DATE
2013-01-24,2013-01-02,2013-01-30,2013-02-03
2013-01-30,2013-01-21,2013-01-13,2013-01-06
2013-01-29,2013-01-28,2013-01-01,2013-01-29
2013-02-16,2013-02-12,2013-01-04,2013-02-11
2013-01-06,2013-02-07,2013-02-25,2013-02-12
2013-01-26,2013-01-28,2013-02-12,2013-01-10
2013-01-26,2013-02-10,2013-01-12,2013-01-30
2013-01-03,2013-01-24,2013-01-19,2013-01-02
2013-01-22,2013-01-13,2013-02-03,2013-02-05
2013-02-06,2013-01-16,2013-02-07,2013-01-11"""
df = pd.read_csv(StringIO(txt))
idx = np.argsort(df, axis=1)
buf = StringIO()
idx.to_csv(buf, index=False, header=False)
print buf.getvalue()
输出:
1,0,2,3
3,2,1,0
2,1,0,3
2,3,1,0
0,1,3,2
3,0,1,2
2,0,3,1
3,0,2,1
1,0,2,3
3,1,0,2