我有一个数据框
df.shape
(15,4)
我想成对比较2行,将主导行提取到另一个df中,然后比较主导地位较低的行。
重复循环,直到最后在新df中获得最不占优势的行。
这是我正在尝试的:
for i in range(len(df)):
a = df.iloc[i]
b = df.iloc[i+1]
diff=a-b
if(diff >= 0):
l.append(a) #List/df
df = df.drop(df.iloc[i]) #Extract 1st row into another df if it's dominant
#Now compare 3rd row with not dominant row (either 1st or 3rd) and repeat the loop
print(l)
我的方法似乎很耗时。有内置的熊猫功能可以帮助我轻松完成任务吗?
df.head
C W L D
A1 82.0 78.00 1100.0 3.0
A2 19.0 99.00 9520.0 3.0
A3 25.0 42.00 1700.0 7.0
A4 93.0 37.00 1700.0 7.0
A5 9.2 0.44 510.0 7.0
(15行,4列)
output =在新df中
Count of number of times each A[i] was dominant over the following A[i+1] row for each independent column (C,W,L and D) respectively.
例如:
First for Criteria C:
new_df should be = df.head()
A1 #Since A1.C is greater than A2.C
A3 #Next comparison between A2.C and A3.C, A3.C wins, hence stored here
A4 #A2 and A4, A4 win
A2 #A2 (still remaining) and A5, here A2 wins and is stored
A5 #Now A5 stored it it's greater than A6.. so on and so forth till A15
然后休息4列。 我是否能够解释我要做什么,还是使其变得更加复杂?等待您的评论,如有需要,我可以添加更多信息。非常感谢。
答案 0 :(得分:1)
这是在列C中使用for循环的示例。 但是通常在使用熊猫时,您实际上不应该使用for循环。我只是不知道该如何解决这部分问题。
import pandas as pd
from io import StringIO
text = """
C W L D
A1 82.0 78.00 1100.0 3.0
A2 19.0 99.00 9520.0 3.0
A3 25.0 42.00 1700.0 7.0
A4 93.0 37.00 1700.0 7.0
A5 9.2 0.44 510.0 7.0
"""
df_subset = df['C'].copy().reset_index()
# loop over 2 lines each, see who is the winner, and sort on that
for i in range(len(df_subset)-1):
df_subset.iloc[i:i+2, :] = df_subset.iloc[i:i+2,:].sort_values(
ascending=False,
by='C',
).values
df_subset.set_index('index')
结果系列:
index C
A1 82.0
A3 25.0
A4 93.0
A2 19.0
A5 9.2