Pandas只能比较标记相同的DataFrame对象

时间:2017-10-22 16:23:20

标签: python-3.x pandas dataframe

我看过很多关于我的问题的帖子,但是他们似乎没有解决它。我有两个相同的DF(dfTemp.rows> dfLags.rows)。

print(dfLags.columns.values)
print(dfLags.dtypes)
print(dfLags)

...生产......

['Site ID' 'Port' 'outSpeed']
Site ID      object
Port         object
outSpeed    float64
dtype: object
        Site ID     Port     outSpeed
0     10.2.20.5  Lag 112  10000000000
1     10.2.20.5  Lag 122  10000000000
2     10.2.21.3    Lag 1   2000000000
3     10.2.21.3    Lag 3  20000000000
4     10.2.21.3   Lag 10  20000000000
5   10.2.22.123    Lag 2   3000000000
6   10.2.22.123    Lag 3   2000000000
7   10.2.22.123   Lag 10   6000000000
8    10.2.22.21    Lag 1   3000000000
9    10.2.22.21    Lag 3   2000000000
10   10.2.22.21   Lag 10   6000000000
11   10.2.46.52    Lag 3  20000000000
12   10.2.46.52   Lag 10  20000000000

另一方面:

print(dfTemp.columns.values)
print(dfTemp.dtypes)
print(dfTemp)

...生产:

['Site ID' 'Port' 'outSpeed']
Site ID      object
Port         object
outSpeed    float64
dtype: object
          Site ID    Port    outSpeed
0      10.2.22.74   1/5/7  1000000000
1      10.2.22.74   1/1/7  1000000000
2      10.2.22.74   1/3/7  1000000000
3      10.2.22.74   1/4/7  1000000000
4       10.2.20.5   3/1/3  1000000000
5      10.2.46.52   3/2/1  1000000000
6      10.2.46.52  3/2/10  1000000000
7      10.2.46.52  Lag 10         NaN
8       10.2.21.3   1/1/1  1000000000
9       10.2.21.3   3/2/5  1000000000
10      10.2.21.3  Lag 10         NaN
..            ...     ...         ...
11    10.2.21.251   1/1/2  1000000000
181   10.2.22.123  1/2/21  1000000000
182   10.2.22.123  2/1/13  1000000000
183   10.2.22.123  2/1/14  1000000000
184   10.2.22.123  2/1/17  1000000000

[185 rows x 3 columns]

每当我尝试比较时,都会收到错误ValueError: Can only compare identically-labeled DataFrame objects。我正在尝试执行以下操作:

dfTemp.loc[ (dfTemp[[SITE_IP,PORT_NAME]]==dfLags[[SITE_IP,PORT_NAME]]) & (dfTemp["outSpeed"].empty), "outSpeed"] = \
dfLags.loc[ (dfTemp[[SITE_IP,PORT_NAME]]==dfLags[[SITE_IP,PORT_NAME]]) & (dfTemp["outSpeed"].empty), "outSpeed"]

有关我为何会收到此类错误的任何提示?

谢谢!

1 个答案:

答案 0 :(得分:1)

编辑:

set_index需要combine_first

df = (dfTemp.set_index(['Site ID', 'Port'])
            .combine_first(dfLags.set_index(['Site ID', 'Port']))
            .reset_index())
print (df)
        Site ID     Port     outSpeed
0     10.2.20.5    3/1/3   1000000000
1     10.2.20.5  Lag 112  10000000000
2     10.2.20.5  Lag 122  10000000000
3   10.2.21.251    1/1/2   1000000000
4     10.2.21.3    1/1/1   1000000000
5     10.2.21.3    3/2/5   1000000000
6     10.2.21.3    Lag 1   2000000000
7     10.2.21.3   Lag 10  20000000000
8     10.2.21.3    Lag 3  20000000000
9   10.2.22.123   1/2/21   1000000000
10  10.2.22.123   2/1/13   1000000000
11  10.2.22.123   2/1/14   1000000000
12  10.2.22.123   2/1/17   1000000000
13  10.2.22.123   Lag 10   6000000000
14  10.2.22.123    Lag 2   3000000000
15  10.2.22.123    Lag 3   2000000000
16   10.2.22.21    Lag 1   3000000000
17   10.2.22.21   Lag 10   6000000000
18   10.2.22.21    Lag 3   2000000000
19   10.2.22.74    1/1/7   1000000000
20   10.2.22.74    1/3/7   1000000000
21   10.2.22.74    1/4/7   1000000000
22   10.2.22.74    1/5/7   1000000000
23   10.2.46.52    3/2/1   1000000000
24   10.2.46.52   3/2/10   1000000000
25   10.2.46.52   Lag 10  20000000000
26   10.2.46.52    Lag 3  20000000000