Question

我对python完全陌生。我有两个具有相同数据集的数据框，但一个是输入，一个是输出。

所以，这是我的输入数据框

Document_ID OFFSET  PredictedFeature
    0         0            2000
    0         8            2000
    0         16           2200
    0         23           2200
    0         30           2200
    1          0            2100
    1          5            2100
    1          7            2100

SO在这里，我将此作为我的ml-model的输入。它只给我这种格式的输出。

现在我的输出看起来像，

  Document_ID    OFFSET   PredictedFeature
        0         0            2000
        0         8            2000
        0         16           2100
        0         23           2100
        0         30           2200
        1          0           2000
        1          5           2000
        1          7           2100

现在，在这两个数据框中，我要尝试的是

对于该ID，对于该OFFSET，输入功能与输出功能相同。如果是，那么我想在新列中添加true作为值，否则将添加false值。

现在，如果我们在示例数据中看到

for ID 0 , for offset 16 the input feature is 2200 and output feature is 2100 so it is a false.

有人可以帮助我吗？任何事情都会有所帮助。

Answer 1

如果两个DataFrame之间的索引值相同，并且前两列的值相同，请使用：

inputdf['new'] = inputdf['PredictedFeature'] == outputdf['PredictedFeature']

Answer 2

concat

>>> df = pd.concat([df1, df2])
>>> df = df.reset_index(drop=True)

分组依据

 >>> df_gpby = df.groupby(list(df.columns))

获取唯一记录的索引

>>> idx = [x[0] for x in df_gpby.groups.values() if len(x) == 1]

过滤器

>>> df.reindex(idx)
         Date   Fruit   Num   Color
9  2013-11-25  Orange   8.6  Orange
8  2013-11-25   Apple  22.1     Red

使用此方法可以按索引值查找不同的数据，可以为该索引值添加新列，只有false另一个值为真

根据数据框中的ID比较两个数据框列

2 个答案: