Question

我对pandas and python.完全陌生，这里我想比较数据框的列，我的数据框就像

Document_ID  offset  JAPE      RFC      MANUAL
    0        0      2000       2000    2000
    0        7      2000       2000    2000
    0        16     51200       0      51200
    0        27     51200       0      51200
    0        36     51200       0      51200
    1        0      2000       2000    2000
    1        3      2000       0       2000
    1        4      2200       2200    2200

现在，我想做的是，我正在尝试比较

JAPE column with the manual column
RFC column with the manual column

现在，我在这里尝试使用compare [JAPE] == compare[MANUAL]，但是后来我知道数据之间存在一些差异。

因此，我现在尝试使用offset列

like for offset 0 of document_ID 0 compare JAPE and MANUAL如果两者相等，那么我尝试添加新列JAPE_MANUAL true or false or 0 or 1。

因此，我试图通过这种方式仅使用偏移量。

任何人都可以帮助我或对此提供一些提示吗？谢谢

预期输出-

 Document_ID     offset  JAPE      RFC      MANUAL  JAPE_MANUAL
        0        0      2000       2000    2000          1
        0        7      2000       2000    2000          1
        0        16     51200       0      51200         1 
        0        27     51200       0      51200          1
        0        36     51200       0      51200         1
        1        0      2000       2000    2000          1
        1        3      2000       0       2000          1
        1        4      2200       2200    2400          0

这是基于偏移量的。

Answer 1

您需要np.where。

import numpy as np
df["JAPE_MANUAL"] = np.where(df['JAPE'] == df['MANUAL'],1,0)

Answer 2

无需使用numpy或其他任何内容。

df["JAPE_MANUAL"] = df["JAPE'] == df["MANUAL"]

Answer 3

通常根据其他列值形成一个新列（由1或0组成，但您可以轻松地概括），可以使用以下内容

df['newColumn']=df.apply(lambda x: 1 if x['column1']==x['column2'] else 0, axis=1)

希望后面的想法很明确，您可以根据自己的具体问题进行调整

致谢

比较数据框的两列并创建一个新列

3 个答案: