Question

我有数据框 df1：

import pandas as pd
data1 = {'id': {0: 'A', 1: 'A', 2: 'A', 3: 'B', 4: 'C', 5: 'B'}, 'col1': {0: '7', 1: ' ', 2: '8', 3: '3', 4: '5', 5: '1'}}
df1 = pd.DataFrame(data1)

和 df2 :

data2 = {'id': {0: 'A', 1: 'B', 2: 'C'}, 'testCol': {0: '0', 1: '4', 2: '1'}}
df2 = pd.DataFrame(data2)

通过使用 pandas 或 numpy，如何比较每个 id 的 df1['col1'] 和 df2['testCol']，并在 df2['testCol'] 或 df2 的新列中返回最大值？

结果：

<头>

ID	testCol
A	8
B	4
C	5

或

<头>

ID	testCol	maxCol
A	0	8
B	4	4
C	1	5

-df1 和 df2 是示例。

Answer 1

试试：

x = (
    pd.concat(
        [df1.groupby("id")["col1"].max(), df2.set_index("id")["testCol"]],
        axis=1,
    )
    .max(axis=1)
    .astype(int)
    .reset_index(name="testCol")
)
print(x)

打印：

  id  testCol
0  A        8
1  B        4
2  C        5

Answer 2

另一种方式是：

result = (
    df1.set_index('id')
    .merge(df2.set_index('id'), on='id')
    .max(axis=1)
)

给出：

id
A    7.0
A    0.0
A    8.0
B    4.0
B    4.0
C    5.0
dtype: float64

然后你可以分组 id 并获得整体最大值：

result = (
    df1.set_index('id')
    .merge(df2.set_index('id'), on='id')
    .max(axis=1)
    .groupby('id')
    .max()
)

输出：

id
A    8.0
B    4.0
C    5.0
dtype: float64

Answer 3

您只需几个步骤即可：

将“col1”重命名为“testCol”以确保所有内容正确对齐
垂直堆叠 df1 和 df2
按ID分组，得到“testCol”的最大值

out = (
    df1.rename(columns={"col1": "testCol"})
    .append(df2)
    .groupby("id", as_index=False)
    ["testCol"].max()
)

print(out)
  id testCol
0  A       8
1  B       4
2  C       5

比较两个差异熊猫数据框上两列的值并返回 max

3 个答案: