我在Suse Enterprise Linux 11 w / python 2.7.9上使用pandas 0.18。我有两张桌子,A和B.
A包含以下列和类型:
>>> print a.dtypes
cid object
bid int64
li object
lit int64
x1 float64
y1 float64
x2 float64
y2 float64
hit_num object
B包含以下列和类型:
>>> print b.dtypes
cid object
li object
x1 float64
y1 float64
x2 float64
y2 float64
hit_num object
现在这里是A:
的示例数据集cid,bid,li,lit,x1,y1,x2,y2,hit_num
id1,0,m0,1,6775.5711,6102.5771,6775.6051,6102.7731,
id1,0,m0,2,6775.5311,6103.0631,6775.5531,6103.2051,
id1,0,m0,3,6775.6231,6103.0631,6775.6451,6103.2051,
id1,0,m0,4,6775.1631,6103.6571,6775.1971,6103.7451,
现在这里是B的示例数据集:
cid,li,x1,y1,x2,y2,hit_num
id1,m0,6775.1631,6103.6571,6775.1971,6103.7451,hello
id1,m0,6775.6231,6103.0631,6775.6451,6103.2051,world
id1,m0,6775.5311,6103.0631,6775.5531,6103.2051,gotta
id1,m0,6775.5711,6102.5771,6775.6051,6102.7731,go
我做 A.update(B)。所以我期待B [hit_num]通过对齐cid,lid,x1,y1,x2,y2列来更新A [hit_num]。
所以我希望这样的事情(除非我对update()的理解是错误的?):
cid,bid,li,lit,x1,y1,x2,y2,hit_num
id1,0,m0,1,6775.5711,6102.5771,6775.6051,6102.7731,0.018,0.02,0.0269,go
id1,0,m0,2,6775.5311,6103.0631,6775.5531,6103.2051,0.018,0.02,0.0269,gotta
id1,0,m0,3,6775.6231,6103.0631,6775.6451,6103.2051,0.018,0.02,0.0269,world
id1,0,m0,4,6775.1631,6103.6571,6775.1971,6103.7451,0.018,0.02,0.0269,hello
然而,我最终得到了以下内容。 “点亮”列(以粗体突出显示)似乎搞砸了,并且存在重复的“1”条目。这不存在于A.我想知道为什么会这样。我创建了一个小例子并尝试重现该问题,但未成功。我在那里得到预期的结果。
但是,在我正在运行回归的较大表中,我看到了这种行为。我打印了表A,表B和A.update(B),我看到了下面的内容。我之间没有调用任何其他数据帧操作。即,伪代码:
print v['les_tables']['foo']
print overlay_tables['foo']
v['les_tables']['foo'].update(overlay_tables['foo'])
print v['les_tables']['foo']
我不完全确定更新如何工作,但我认为它使用某种类型的相等运算符来匹配列?如果是这样,x1,y1,x2,y2是否会引起任何问题?我有什么想法吗?
我已经确认要对齐的列在两个A / B中都是相同的名称/类型(参见上面的A.dtypes / B.dtypes)。
CID,出价,李,点燃,X1,Y1,X2,Y2,hit_num id1,0,M0, 1 下,6775.5711,6102.5771,6775.6051,6102.7731,0.018,0.02,0.0269,去 id1,0,M0, 3 下,6775.5311,6103.0631,6775.5531,6103.2051,0.018,0.02,0.0269,总得 id1,0,M0, 2 下,6775.6231,6103.0631,6775.6451,6103.2051,0.018,0.02,0.0269,世界 id1,0,M0, 1 下,6775.1631,6103.6571,6775.1971,6103.7451,0.018,0.02,0.0269,你好
答案 0 :(得分:0)
试试这个:
In [73]: df = (A.set_index(['cid','li','x1','y1','x2','y2'])
....: .drop(['hit_num'], axis=1)
....: .join(B.set_index(['cid','li','x1','y1','x2','y2']))
....: .reset_index()
....: )
In [74]: df
Out[74]:
cid li x1 y1 x2 y2 bid lit hit_num
0 id1 m0 6775.5711 6102.5771 6775.6051 6102.7731 0 1 go
1 id1 m0 6775.5311 6103.0631 6775.5531 6103.2051 0 2 gotta
2 id1 m0 6775.6231 6103.0631 6775.6451 6103.2051 0 3 world
3 id1 m0 6775.1631 6103.6571 6775.1971 6103.7451 0 4 hello