I have a dataframe in pandas which look something like this:
>>> df[1:3]
0 1 2 3 4 5 6 7 8
1 -0.59 -99.0 924.0 20.1 5.0 4.0 57.0 19.0 8.0
2 -1.30 -279.0 297.0 16.1 30.0 4.4 63.0 19.0 10.0
The number of points in the dataframe is ~1000.
Given a set of columns, I want to find out how many time each point is "better" than the other?
Given a set of n
columns, a point is better than other point, if it better in at least one of the columns and equal in other columns.
A point which is better in one column and worse in n-1
is not considered better because its better than the other point in at least one column.
Edit1: Example:
>>> df
0 1 2
1 -0.59 -99.0 924.0
2 -1.30 -279.0 297.0
3 2.00 -100.0 500.0
4 0.0 0.0 0.0
If we consider only column 0, then the result would be:
1 - 1
2 - 0
3 - 3
4 - 2
because point 1 (-0.59) is only better than point 2 with respect to column 1.
Another example by taking columns - 0 and 1:
1 - 1 (only for point 2 all values i.e. column 0 and column 1 are either smaller than point 1 or lesser)
2 - 0 (since no point is has any lesser than this in any dimension)
3 - 1 (point 2)
4 - 2 (point 1 and 2)
Edit 2: Perhaps, something like a function which when given a dataframe, a point (index of point) and a set of columns could give the count as - for each subset of columns how many times that point is better than other points.
def f(p, df, c):
"""returns
A list : L = [(c1,n), (c2,m)..]
where c1 is a proper subset of c and n is the number of times that this point was better than other points."""
答案 0 :(得分:1)
分别对每一列进行排名 通过对每一列进行排名,我可以准确地看到该列中的特定行中有多少其他行大于。
d1 = df.rank().sub(1)
d1
要解决您的问题,逻辑上必须是这样的情况:对于特定行,行元素中的最小等级恰好是该行中每个元素都大于的其他行的数量。
对于前两列[0, 1]
,可以通过取d1
我使用它作为参考来比较原始的前两列与等级
pd.concat([df.iloc[:, :2], d1.iloc[:, :2]], axis=1, keys=['raw', 'ranked'])
如上所述拿分钟。
d1.iloc[:, :2].min(1)
1 1.0
2 0.0
3 1.0
4 2.0
dtype: float64
将结果放在原始数据和排名旁边,以便我们可以看到它
pd.concat([df.iloc[:, :2], d1.iloc[:, :2], d1.iloc[:, :2].min(1)],
axis=1, keys=['raw', 'ranked', 'results'])
果然,这与你的预期结果有关。