我想返回一个dataFrame,每行排序(让我们说降序)。因此,如果我有pandas.DataFrame
名为data
:
In [38]: data
Out[38]:
c1 c2 c3 c4 c5 c6
Date
2012-10-22 0.973371 0.226342 0.968282 0.872330 0.273880 0.746156
2012-10-19 0.497048 0.351332 0.310025 0.726669 0.344202 0.878755
2012-10-18 0.315764 0.178584 0.838223 0.749962 0.850462 0.400253
2012-10-17 0.162879 0.068409 0.704094 0.712860 0.537545 0.009789
我想回复以下内容:
In [39]: sorted_frame
Out[39]:
0 1 2 3 4 5
Date
2012-10-22 0.973371 0.968282 0.872332 0.746156 0.273880 0.226342
2012-10-19 0.878755 0.726669 0.497048 0.351332 0.344202 0.310025
2012-10-18 0.850462 0.838223 0.749962 0.400253 0.315764 0.178584
2012-10-17 0.712860 0.704094 0.537545 0.162879 0.068409 0.009789
我已经尝试了DataFrame.sort(axis = 1)
,但是没有达到预期的结果:
In [40]: data.sort(axis = 1)
Out[43]:
c1 c2 c3 c4 c5 c6
Date
2012-10-22 0.973371 0.226342 0.968282 0.872330 0.273880 0.746156
2012-10-19 0.497048 0.351332 0.310025 0.726669 0.344202 0.878755
2012-10-18 0.315764 0.178584 0.838223 0.749962 0.850462 0.400253
2012-10-17 0.162879 0.068409 0.704094 0.712860 0.537545 0.009789
我创建了以下函数来完成我正在寻找的东西(使用pandas.TimeSeries.order()
):
import numpy
def sorted_by_row(frame, ascending = False):
vals = numpy.tile(numpy.nan,frame.shape)
for row in numpy.arange(frame.shape[0]):
vals[row, :] = frame.ix[row, :].order(ascending = ascending)
return pandas.DataFrame(vals, index = frame.index)
但是,我的目标是能够在DataFrame.apply()
方法中使用行方式功能(因此我可以将所需的功能应用于我构建的其他功能)。我试过了:
#TimeSeries.order() sorts a pandas.TimeSeries object
data.apply(lambda x: x.order(), axis = 1)
但同样,我没有得到所需的DataFrame
以上(我已输出足够的DataFrame'
s所以我将把页面省去房地产。)
非常感谢您的帮助,
-B
答案 0 :(得分:2)
嗯,开箱即用熊猫并不容易。首先,熟悉argsort
:
In [8]: df
Out[8]:
0 1 2 3 4
2012-10-17 1.542735 1.081290 2.602967 0.748706 0.682501
2012-10-18 0.058414 0.148083 0.094104 0.716789 2.482998
2012-10-19 2.396277 0.524733 2.169018 1.365622 0.590767
2012-10-20 0.513535 1.542485 0.186261 2.138740 1.173894
2012-10-21 0.495713 1.401872 0.919931 0.055136 1.358439
2012-10-22 1.010086 0.350249 1.116935 0.323305 0.506086
In [12]: inds = df.values.argsort(1)
In [13]: inds
Out[13]:
array([[4, 3, 1, 0, 2],
[0, 2, 1, 3, 4],
[1, 4, 3, 2, 0],
[2, 0, 4, 1, 3],
[3, 0, 2, 4, 1],
[3, 1, 4, 0, 2]])
这些是每行的间接排序索引。现在你要做类似的事情:
new_values = np.empty_like(df)
for i, row in enumerate(df.values):
# sort in descending order
new_values[i] = row[inds[i]][::-1]
sorted_df = DataFrame(new_values, index=df.index)
不是那么令人满意,但它完成了工作:
In [15]: sorted_df
Out[15]:
0 1 2 3 4
2012-10-17 2.602967 1.542735 1.081290 0.748706 0.682501
2012-10-18 2.482998 0.716789 0.148083 0.094104 0.058414
2012-10-19 2.396277 2.169018 1.365622 0.590767 0.524733
2012-10-20 2.138740 1.542485 1.173894 0.513535 0.186261
2012-10-21 1.401872 1.358439 0.919931 0.495713 0.055136
2012-10-22 1.116935 1.010086 0.506086 0.350249 0.323305
更一般地说,你可以这样做:
In [23]: df.apply(lambda x: np.sort(x.values)[::-1], axis=1)
Out[23]:
0 1 2 3 4
2012-10-17 2.602967 1.542735 1.081290 0.748706 0.682501
2012-10-18 2.482998 0.716789 0.148083 0.094104 0.058414
2012-10-19 2.396277 2.169018 1.365622 0.590767 0.524733
2012-10-20 2.138740 1.542485 1.173894 0.513535 0.186261
2012-10-21 1.401872 1.358439 0.919931 0.495713 0.055136
2012-10-22 1.116935 1.010086 0.506086 0.350249 0.323305
但您自己负责分配新列
答案 1 :(得分:1)
排序是一个很大的主题,我相信有很多方法可以做到这一点。这是一个。
首先创建一个示例数据帧。
In [31]: rndrange = pd.DatetimeIndex(start='10/17/2012', end='10/22/2012', freq='D')
In [32]: df = pd.DataFrame(np.random.randn(len(rndrange),5),index=rndrange)
In [33]: df = df.applymap(abs) #Easier to see sorting if all vals are positive
In [34]: df
Out[34]:
0 1 2 3 4
2012-10-17 1.542735 1.081290 2.602967 0.748706 0.682501
2012-10-18 0.058414 0.148083 0.094104 0.716789 2.482998
2012-10-19 2.396277 0.524733 2.169018 1.365622 0.590767
2012-10-20 0.513535 1.542485 0.186261 2.138740 1.173894
2012-10-21 0.495713 1.401872 0.919931 0.055136 1.358439
2012-10-22 1.010086 0.350249 1.116935 0.323305 0.506086
排序:
In [35]: df.as_matrix().sort(1)
In [36]: df
Out[36]:
0 1 2 3 4
2012-10-17 0.682501 0.748706 1.081290 1.542735 2.602967
2012-10-18 0.058414 0.094104 0.148083 0.716789 2.482998
2012-10-19 0.524733 0.590767 1.365622 2.169018 2.396277
2012-10-20 0.186261 0.513535 1.173894 1.542485 2.138740
2012-10-21 0.055136 0.495713 0.919931 1.358439 1.401872
2012-10-22 0.323305 0.350249 0.506086 1.010086 1.116935