Pandas:Update()数据帧问题

时间:2016-05-03 21:36:47

标签: pandas

我在Suse Enterprise Linux 11 w / python 2.7.9上使用pandas 0.18。我有两张桌子,A和B.

A包含以下列和类型:

>>> print a.dtypes
cid     object
bid     int64
li      object
lit     int64
x1      float64
y1      float64
x2      float64
y2      float64
hit_num object

B包含以下列和类型:

>>> print b.dtypes
    cid     object
    li      object
    x1      float64
    y1      float64
    x2      float64
    y2      float64
    hit_num object

现在这里是A:

的示例数据集
cid,bid,li,lit,x1,y1,x2,y2,hit_num
id1,0,m0,1,6775.5711,6102.5771,6775.6051,6102.7731,
id1,0,m0,2,6775.5311,6103.0631,6775.5531,6103.2051,
id1,0,m0,3,6775.6231,6103.0631,6775.6451,6103.2051,
id1,0,m0,4,6775.1631,6103.6571,6775.1971,6103.7451,

现在这里是B的示例数据集:

cid,li,x1,y1,x2,y2,hit_num    
id1,m0,6775.1631,6103.6571,6775.1971,6103.7451,hello
id1,m0,6775.6231,6103.0631,6775.6451,6103.2051,world
id1,m0,6775.5311,6103.0631,6775.5531,6103.2051,gotta
id1,m0,6775.5711,6102.5771,6775.6051,6102.7731,go

我做 A.update(B)。所以我期待B [hit_num]通过对齐cid,lid,x1,y1,x2,y2列来更新A [hit_num]。

所以我希望这样的事情(除非我对update()的理解是错误的?):

cid,bid,li,lit,x1,y1,x2,y2,hit_num
id1,0,m0,1,6775.5711,6102.5771,6775.6051,6102.7731,0.018,0.02,0.0269,go
id1,0,m0,2,6775.5311,6103.0631,6775.5531,6103.2051,0.018,0.02,0.0269,gotta
id1,0,m0,3,6775.6231,6103.0631,6775.6451,6103.2051,0.018,0.02,0.0269,world
id1,0,m0,4,6775.1631,6103.6571,6775.1971,6103.7451,0.018,0.02,0.0269,hello

然而,我最终得到了以下内容。 “点亮”列(以粗体突出显示)似乎搞砸了,并且存在重复的“1”条目。这不存在于A.我想知道为什么会这样。我创建了一个小例子并尝试重现该问题,但未成功。我在那里得到预期的结果。

但是,在我正在运行回归的较大表中,我看到了这种行为。我打印了表A,表B和A.update(B),我看到了下面的内容。我之间没有调用任何其他数据帧操作。即,伪代码:

print v['les_tables']['foo']
print overlay_tables['foo']
v['les_tables']['foo'].update(overlay_tables['foo'])
print v['les_tables']['foo']

我不完全确定更新如何工作,但我认为它使用某种类型的相等运算符来匹配列?如果是这样,x1,y1,x2,y2是否会引起任何问题?我有什么想法吗?

我已经确认要对齐的列在两个A / B中都是相同的名称/类型(参见上面的A.dtypes / B.dtypes)。

CID,出价,李,点燃,X1,Y1,X2,Y2,hit_num id1,0,M0, 1 下,6775.5711,6102.5771,6775.6051,6102.7731,0.018,0.02,0.0269,去 id1,0,M0, 3 下,6775.5311,6103.0631,6775.5531,6103.2051,0.018,0.02,0.0269,总得 id1,0,M0, 2 下,6775.6231,6103.0631,6775.6451,6103.2051,0.018,0.02,0.0269,世界 id1,0,M0, 1 下,6775.1631,6103.6571,6775.1971,6103.7451,0.018,0.02,0.0269,你好

1 个答案:

答案 0 :(得分:0)

试试这个:

In [73]: df = (A.set_index(['cid','li','x1','y1','x2','y2'])
   ....:        .drop(['hit_num'], axis=1)
   ....:        .join(B.set_index(['cid','li','x1','y1','x2','y2']))
   ....:        .reset_index()
   ....:      )

In [74]: df
Out[74]:
   cid  li         x1         y1         x2         y2  bid  lit hit_num
0  id1  m0  6775.5711  6102.5771  6775.6051  6102.7731    0    1      go
1  id1  m0  6775.5311  6103.0631  6775.5531  6103.2051    0    2   gotta
2  id1  m0  6775.6231  6103.0631  6775.6451  6103.2051    0    3   world
3  id1  m0  6775.1631  6103.6571  6775.1971  6103.7451    0    4   hello