Pandas合并错误输出

时间:2016-10-22 09:43:57

标签: python pandas dataframe

好 我已经浏览了一些与此主题相关的博客 - 但我仍然遇到同样的问题。我有两个数据帧。两者都有一个X列,其中包含SHA2值。它包含六角形字符串。

示例(数据帧查找)

X,Y
000000000E000394574D69637264736F66742057696E646F7773204861726477,7
0000000080000000000000090099000000040005000000000000008F2A000010,7
000000020000000000000000777700010000000000020000000040C002004600,24
0000005BC614437F6BE049237FA1DDD2083B5BA43A10175E4377A59839DC2B64,7

示例(数据框源)

X,Z
000000000E000394574D69637264736F66742057696E646F7773204861726477,'blah'
0000000080000000000000090099000000040005000000000000008F2A000010,'blah blah'
000000020000000000000000777700010000000000020000000040C002004600,'dummy'

所以现在我正在做

lookup['X'] = lookup['X'].astype(str)
source['X'] = source['X'].astype(str)
source['newcolumn'] = source.merge(lookup, on='X', how='inner')['Y']

源有160,000行,查找大约有500,000行。

现在,当操作完成时,我得到了新列,但值是错误的。 我已经确保它们没有从重复的X值中获取,因为在任何一个表中都没有重复的X.

所以,这真的让我感到愚蠢,让我的生活系统非常痛苦。任何人都可以建议问题是什么?

我现在用

替换了这个电话
def getReputation(lookupDF,value,lookupcolumn,default):
    lookupRows = lookupDF.loc[lookupDF['X']==value]
    if lookupRows.shape[0]>0:
        return lookupRows[lookupcolumn].values[0]
    else:
        return default

source['newcolumn'] = source.apply(lambda x: getReputation(lookup,x['X'],'Y',-1),axis=1)

此代码有效 - 但显然它是BAD代码并且耗费了很长时间。我可以多处理它 - 但问题仍然存在。为什么合并失败?

感谢您的帮助 RGDS

1 个答案:

答案 0 :(得分:3)

在这种情况下我会使用map()方法:

首先将'X'设置为public MainWindow() { InitializeComponent(); var viewModel = new ViewModel(); viewModel.Shapes.Add(new ShapeData { Type = "Circle", Geometry = new EllipseGeometry(new Point(100, 100), 50, 50), Fill = Brushes.Orange, Stroke = Brushes.Navy, StrokeThickness = 2 }); viewModel.Shapes.Add(new ShapeData { Type = "Rectangle", Geometry = new RectangleGeometry(new Rect(200, 50, 50, 100)), Fill = Brushes.Yellow, Stroke = Brushes.DarkGreen, StrokeThickness = 2 }); DataContext = viewModel; } DF中的索引:

lookup

实际上,您的代码适用于您的样本DF:

In [58]: lookup.set_index('X', inplace=True)

In [59]: lookup
Out[59]:
                                                                   Y
X
000000000E000394574D69637264736F66742057696E646F7773204861726477   7
0000000080000000000000090099000000040005000000000000008F2A000010   7
000000020000000000000000777700010000000000020000000040C002004600  24
0000005BC614437F6BE049237FA1DDD2083B5BA43A10175E4377A59839DC2B64   7

In [60]: df['Y'] = df.X.map(lookup.Y)

In [61]: df
Out[61]:
                                                                  X          Z   Y
0  000000000E000394574D69637264736F66742057696E646F7773204861726477       blah   7
1  0000000080000000000000090099000000040005000000000000008F2A000010  blah blah   7
2  000000020000000000000000777700010000000000020000000040C002004600      dummy  24

因此,请检查两个DF中的In [68]: df.merge(lookup, on='X', how='inner') Out[68]: X Z Y 0 000000000E000394574D69637264736F66742057696E646F7773204861726477 blah 7 1 0000000080000000000000090099000000040005000000000000008F2A000010 blah blah 7 2 000000020000000000000000777700010000000000020000000040C002004600 dummy 24 列中是否有相同的数据和dtypes