Pandas - 按行最大值排序数据帧

时间:2017-10-05 17:08:29

标签: python pandas

我有这个数据框:

df
            artist                     track     pos     neg     neu
0   Sufjan Stevens  Should Have Known Better    0.07    0.93     0.0
8        Radiohead               Daydreaming    0.05    0.95     0.0
1   Sufjan Stevens      To Be Alone With You    0.05    0.95     0.0
5        Radiohead        Desert Island Disk    0.08    0.92     0.0
11   Elliott Smith          Between the Bars    0.03    0.97     0.0
7       Aphex Twin                Avril 14th    1.00    0.00     0.0
2     Jeff Buckley                Hallelujah    0.39    0.61     0.0
4   Sufjan Stevens       Casimir Pulaski Day    0.09    0.91     0.0
9   Sufjan Stevens            The Only Thing    0.09    0.91     0.0
3   Sufjan Stevens        Death with Dignity    0.03    0.97     0.0
6        Radiohead                     Codex    1.00    0.00     0.0
10       Radiohead       You And Whose Army?    0.00    1.00     0.0

我根据与input_value = 0.8的接近程度进行排序

像这样:

    v = df[['pos', 'neg', 'neu']].values
    df.iloc[np.lexsort(np.abs(v - input_value).T)]

产生:

    artist          track                      pos     neg          neu
4   Sufjan Stevens       Casimir Pulaski Day   0.09    0.91          0.0
9   Sufjan Stevens            The Only Thing   0.09    0.91          0.0
5        Radiohead        Desert Island Disk   0.08    0.92          0.0
0   Sufjan Stevens  Should Have Known Better   0.07    0.93          0.0
1   Sufjan Stevens      To Be Alone With You   0.05    0.95          0.0
8        Radiohead               Daydreaming   0.05    0.95          0.0
3   Sufjan Stevens        Death with Dignity   0.03    0.97          0.0
11   Elliott Smith          Between the Bars   0.03    0.97          0.0
2     Jeff Buckley                Hallelujah   0.39    0.61          0.0
6        Radiohead                     Codex   1.00    0.00          0.0
7       Aphex Twin                Avril 14th   1.00    0.00          0.0
10       Radiohead       You And Whose Army?   0.00    1.00          0.0

但是给出input_label = 'neg'

我想插入if input_label = 'neg'

的条件

然后neg值必须是最高值row-wise

如果不满足条件,则相应地消除行,

结束于:

    artist          track                      pos     neg          neu
4   Sufjan Stevens       Casimir Pulaski Day   0.09    0.91          0.0
9   Sufjan Stevens            The Only Thing   0.09    0.91          0.0
5        Radiohead        Desert Island Disk   0.08    0.92          0.0
0   Sufjan Stevens  Should Have Known Better   0.07    0.93          0.0
1   Sufjan Stevens      To Be Alone With You   0.05    0.95          0.0
8        Radiohead               Daydreaming   0.05    0.95          0.0
3   Sufjan Stevens        Death with Dignity   0.03    0.97          0.0
11   Elliott Smith          Between the Bars   0.03    0.97          0.0
2     Jeff Buckley                Hallelujah   0.39    0.61          0.0
10       Radiohead       You And Whose Army?   0.00    1.00          0.0

我该怎么做?

1 个答案:

答案 0 :(得分:0)

v = df.iloc[:, -3:]
df = df.iloc[np.lexsort(np.abs(v - input_value).T)]

你可以在这里使用df.query,简化一些事情。

result = df.query('neg > pos and neg > neu'); result
            artist                     track   pos   neg  neu
4   Sufjan Stevens       Casimir Pulaski Day  0.09  0.91  0.0
9   Sufjan Stevens            The Only Thing  0.09  0.91  0.0
5        Radiohead        Desert Island Disk  0.08  0.92  0.0
0   Sufjan Stevens  Should Have Known Better  0.07  0.93  0.0
8        Radiohead               Daydreaming  0.05  0.95  0.0
1   Sufjan Stevens      To Be Alone With You  0.05  0.95  0.0
11   Elliott Smith          Between the Bars  0.03  0.97  0.0
3   Sufjan Stevens        Death with Dignity  0.03  0.97  0.0
2     Jeff Buckley                Hallelujah  0.39  0.61  0.0
10       Radiohead       You And Whose Army?  0.00  1.00  0.0

np.argmax的替代解决方案:

mask = np.argmax(df.iloc[:, -3:].values, 1) == 1

mask
array([ True,  True,  True,  True,  True,  True,  True, False,  True,
        True,  True, False], dtype=bool)

result = df[mask]; result    
            artist                     track   pos   neg  neu
11   Elliott Smith          Between the Bars  0.03  0.97  0.0
2     Jeff Buckley                Hallelujah  0.39  0.61  0.0
0   Sufjan Stevens  Should Have Known Better  0.07  0.93  0.0
4   Sufjan Stevens       Casimir Pulaski Day  0.09  0.91  0.0
9   Sufjan Stevens            The Only Thing  0.09  0.91  0.0
5        Radiohead        Desert Island Disk  0.08  0.92  0.0
1   Sufjan Stevens      To Be Alone With You  0.05  0.95  0.0
3   Sufjan Stevens        Death with Dignity  0.03  0.97  0.0
10       Radiohead       You And Whose Army?  0.00  1.00  0.0
8        Radiohead               Daydreaming  0.05  0.95  0.0

您可以使用df

对索引上的sort_index进行排序
result.sort_index()
            artist                     track   pos   neg  neu
0   Sufjan Stevens  Should Have Known Better  0.07  0.93  0.0
1   Sufjan Stevens      To Be Alone With You  0.05  0.95  0.0
2     Jeff Buckley                Hallelujah  0.39  0.61  0.0
3   Sufjan Stevens        Death with Dignity  0.03  0.97  0.0
4   Sufjan Stevens       Casimir Pulaski Day  0.09  0.91  0.0
5        Radiohead        Desert Island Disk  0.08  0.92  0.0
8        Radiohead               Daydreaming  0.05  0.95  0.0
9   Sufjan Stevens            The Only Thing  0.09  0.91  0.0
10       Radiohead       You And Whose Army?  0.00  1.00  0.0
11   Elliott Smith          Between the Bars  0.03  0.97  0.0