df [column] = apply(lambda row:row.sort_values()[1])表现得很奇怪

时间:2016-11-08 15:05:25

标签: python pandas

问题是: 为什么top_2_的值与top_2_is不同 - 换句话说 - 如果将apply函数的结果分配给列,为什么它的结果会出错?

编辑:我认为这个问题有点被误解了,我为它创造了另一个例子。 EDIT2:我使用Python 2.7.12 :: Anaconda 4.0.0(64位):: Pandas 0.18.0

import pandas as pd

d = {'one' : [1., 2., 3., 4.],
     'two' : [4., 3., 2., 1.]}
df52 = pd.DataFrame(d)

top_1_should = df52.apply(lambda row: row.sort_values()[0], 1)
top_2_should = df52.apply(lambda row: row.sort_values()[1], 1)
df52['top_1_is'] = df52.apply(lambda row: row.sort_values()[0], 1)
df52['top_1_should'] = top_1_should
df52['top_2_is'] = df52.apply(lambda row: row.sort_values()[1], 1)
df52['top_2_should'] = top_2_should
print df52

   one  two  top_1_is  top_1_should  top_2_is  top_2_should
0  1.0  4.0       1.0           1.0       1.0           4.0
1  2.0  3.0       2.0           2.0       2.0           3.0
2  3.0  2.0       2.0           2.0       2.0           3.0
3  4.0  1.0       1.0           1.0       1.0           4.0

最佳, 扬

2 个答案:

答案 0 :(得分:1)

我认为您可以将Series.sort_valuesvalues一起用于中断对齐行:

print (df52.apply(lambda row: row.sort_values().values, axis=1))
   one  two
0  1.0  4.0
1  2.0  3.0
2  2.0  3.0
3  1.0  4.0

或者:

print (pd.DataFrame(np.sort(df52.values), df52.index, df52.columns))
   one  two
0  1.0  4.0
1  2.0  3.0
2  2.0  3.0
3  1.0  4.0

如果使用print,则会获得排序输出 - 如果之前添加新列,则需要更改Series中所选行的位置DataFrame中的列:

top_1_should = df52.apply(lambda row: row.sort_values()[0], 1)
top_2_should = df52.apply(lambda row: row.sort_values()[1], 1)
df52['top_1_is'] = df52.apply(lambda row: row.sort_values()[0], 1)
df52['top_1_should'] = top_1_should
df52['top_2_is'] = df52.apply(lambda row: row.sort_values()[1], 1)
df52['top_2_is'] = df52.apply(lambda row: print(row.sort_values()), 1)
one             1.0
top_1_is        1.0
top_1_should    1.0
top_2_is        1.0
two             4.0
Name: 0, dtype: float64
one             2.0
top_1_is        2.0
top_1_should    2.0
top_2_is        2.0
two             3.0
Name: 1, dtype: float64
two             2.0
top_1_is        2.0
top_1_should    2.0
top_2_is        2.0
one             3.0
Name: 2, dtype: float64
two             1.0
top_1_is        1.0
top_1_should    1.0
top_2_is        1.0
one             4.0
Name: 3, dtype: float64

答案 1 :(得分:0)

import pandas as pd

d = {'one' : [1., 2., 3., 4.],
     'two' : [2., 3., 4., 5.]}
df52 = pd.DataFrame(d)

top_1_should = df52.apply(lambda row: row.sort_values()[0], 1)
top_2_should = df52.apply(lambda row: row.sort_values()[1], 1)
df52['top_1_is'] = df52.apply(lambda row: row.sort_values()[0], 1)
df52['top_1_should'] = top_1_should
df52['top_2_is'] = df52.apply(lambda row: row.sort_values()[3], 1)
df52['top_2_should'] = top_2_should
print(df52)

返回:

  one  two  top_1_is  top_1_should  top_2_is  top_2_should
0    1    2         1             1         2             2
1    2    3         2             2         3             3
2    3    4         3             3         4             4
3    4    5         4             4         5             5