处理两个大型数据帧 - 使用np.isin时出现问题

时间:2018-03-20 08:58:25

标签: python pandas numpy

示例:

row=['12347','Van','18/01/2017']
npvalues = np.array([ ['12345','Bus','23/02/2017'],['12346','Truck','01/07/2017'],['12347','Van','18/01/2017']  ])
np.isin(row, npvalues)

必需输出:[True, True, True]

  

ValueError:数组太大; arr.size * arr.dtype.itemsize大于最大可能大小。

1 个答案:

答案 0 :(得分:0)

投放'行'变量为np.array而不是列表。

import numpy as np
row=['12347','Van','18/01/2017']
npvalues = np.array([ ['12345','Bus','23/02/2017'],['12346','Truck','01/07/2017'],['12347','Van','18/01/2017']  ])

row
Out[60]: ['12347', 'Van', '18/01/2017']

npvalues
Out[61]: 
array([['12345', 'Bus', '23/02/2017'],
       ['12346', 'Truck', '01/07/2017'],
       ['12347', 'Van', '18/01/2017']],
      dtype='<U10')

# Cast instead
row = np.asarray(row)
np.isin(row, npvalues)
Out[63]: array([ True,  True,  True], dtype=bool)

注意 - 我能够按原样运行您的代码,并获得所需的答案。

row=['12347','Van','18/01/2017']
npvalues = np.array([ ['12345','Bus','23/02/2017'],['12346','Truck','01/07/2017'],['12347','Van','18/01/2017']  ])
np.isin(row, npvalues)
Out[64]: array([ True,  True,  True], dtype=bool)

以下是我的版本信息

import sys
sys.version
Out[71]: '3.6.4 |Anaconda, Inc.| (default, Mar 12 2018, 20:20:50) [MSC v.1900 64 bit (AMD64)]'
np.version.full_version
Out[67]: '1.13.3'