Question

我想比较两个series中的元素。

0    1
1    3
2    4
3    2
4    4
Name: s1, dtype: int32
0    3
1    3
2    0
3    5
4    1
Name: s2, dtype: int64

为了轻松比较series，我使用了itertools.combinations：

x = combinations(s1, 2)
y = combinations(s2, 2)

和结果 x ：

(1, 3)
(1, 4)
(1, 2)
(1, 4)
(3, 4)
(3, 2)
(3, 4)
(4, 2)
(4, 4)
(2, 4)

y ：

(3, 3)
(3, 0)
(3, 5)
(3, 1)
(3, 0)
(3, 5)
(3, 1)
(0, 5)
(0, 1)
(5, 1)

比较方法部分类似于肯德尔的tau距离。 x (x1, x2)中的对和 y (y1, y2)中的对。如果x1 > x2和y1 > y2，或者x1 < x2和y1 < y2，则score = score+1；否则为score = score。但是到目前为止，我仍然无法找到一种比较两对元素的方法。

我得到m1，m2和m1|m2：

m1：

0    False
1    False
2    False
3    False
4    False
5    False
6    False
7    False
8    False
9    False
dtype: bool

m2：

0    False
1    False
2     True
3    False
4    False
5    False
6    False
7    False
8    False
9    False
dtype: bool

m1 | m2：

0    False
1    False
2     True
3    False
4    False
5    False
6    False
7    False
8    False
9    False
dtype: bool

我得到了与您相同的结果。我不知道为什么要花这么多时间。

m1和m2都包含默认的所有假值。当然可以，目前的结果在理想情况下是正确的。但是我希望score每次(m1 | m2) == true加1。

如上得分的理想结果。

Answer 1

您可以从输出创建DataFrame，然后根据条件修改数据：

#changed data for better sample
s1 = pd.Series([1,3,4,2,4])
s2 = pd.Series([3,4,0,5,8])

x = combinations(s1, 2)
y = combinations(s2, 2)

dfx = pd.DataFrame(list(x)).rename(columns=lambda x: x+1).add_prefix('x')
dfy = pd.DataFrame(list(y)).rename(columns=lambda x: x+1).add_prefix('y')
df = pd.concat([dfx, dfy], axis=1)

m1 = (df.x1 > df.x2) & (df.y1 > df.y2)
m2 = (df.x1 < df.x2) & (df.y1 < df.y2)
m = m1 | m2

print (m)
0     True
1    False
2     True
3     True
4    False
5    False
6     True
7    False
8    False
9     True
dtype: bool

df['score'] = np.where(m, m.cumsum(), 0)
print (df)
   x1  x2  y1  y2  score
0   1   3   3   4      1
1   1   4   3   0      0
2   1   2   3   5      2
3   1   4   3   8      3
4   3   4   4   0      0
5   3   2   4   5      0
6   3   4   4   8      4
7   4   2   0   5      0
8   4   4   0   8      0
9   2   4   5   8      5

如何比较itertools.combinations中的元素？

1 个答案: