查找并存储大熊猫df的优势行到另一个df中

时间:2020-11-07 10:10:53

标签: python-3.x pandas loops iteration

我有一个数据框 df.shape (15,4) 我想成对比较2行,将主导行提取到另一个df中,然后比较主导地位较低的行。 重复循环,直到最后在新df中获得最不占优势的行。 这是我正在尝试的:

for i in range(len(df)):
a = df.iloc[i]
b = df.iloc[i+1]
diff=a-b
if(diff >= 0):
    l.append(a) #List/df
    df = df.drop(df.iloc[i])  #Extract 1st row into another df if it's dominant
#Now compare 3rd row with not dominant row (either 1st or 3rd) and repeat the loop
print(l)

我的方法似乎很耗时。有内置的熊猫功能可以帮助我轻松完成任务吗?

df.head

      C      W       L    D
A1  82.0  78.00  1100.0  3.0
A2  19.0  99.00  9520.0  3.0
A3  25.0  42.00  1700.0  7.0
A4  93.0  37.00  1700.0  7.0
A5   9.2   0.44   510.0  7.0

(15行,4列)

output =在新df中

Count of number of times each A[i] was dominant over the following A[i+1] row for each independent column (C,W,L and D) respectively.

例如:

First for Criteria C:
new_df should be = df.head()
A1  #Since A1.C is greater than A2.C
A3  #Next comparison between A2.C and A3.C, A3.C wins, hence stored here
A4  #A2 and A4, A4 win
A2  #A2 (still remaining) and A5, here A2 wins and is stored
A5  #Now A5 stored it it's greater than A6.. so on and so forth till A15

然后休息4列。 我是否能够解释我要做什么,还是使其变得更加复杂?等待您的评论,如有需要,我可以添加更多信息。非常感谢。

1 个答案:

答案 0 :(得分:1)

这是在列C中使用for循环的示例。 但是通常在使用熊猫时,您实际上不应该使用for循环。我只是不知道该如何解决这部分问题。

import pandas as pd
from io import StringIO

text = """
      C      W       L    D
A1  82.0  78.00  1100.0  3.0
A2  19.0  99.00  9520.0  3.0
A3  25.0  42.00  1700.0  7.0
A4  93.0  37.00  1700.0  7.0
A5   9.2   0.44   510.0  7.0
"""

df_subset = df['C'].copy().reset_index()

# loop over 2 lines each, see who is the winner, and sort on that
for i in range(len(df_subset)-1):
    df_subset.iloc[i:i+2, :] = df_subset.iloc[i:i+2,:].sort_values(
        ascending=False, 
        by='C',
    ).values
    
df_subset.set_index('index')

结果系列:

index C 
A1    82.0
A3    25.0
A4    93.0
A2    19.0
A5     9.2