Question

我正在尝试将两个不同数组中的所有匹配元素放入单个数组中。但是我遇到了一个我不太确定的类型错误。

这是我最初尝试做的：

IRS_zips = AGI.zipcode.unique() # np array of type int
medi_zips = df.nppes_provider_zip.unique() # np array of type object

为了找到匹配的元素，我要做：

like_zips = np.intersect1d(IRS_zips,medi_zips)

这将引发此错误：

TypeError: '<' not supported between instances of 'str' and 'int'

这很有意义，所以我检查了两个数组的类型并尝试将它们转换，在这种情况下，medi_zips不是正确的类型，所以我尝试转换一个数组：

medi_fixed = medi_zips.astype(int)

哪个抛出错误：

ValueError: invalid literal for int() with base 10: 'M4K 2'

我很好奇，所以我在数据帧中寻找一个等于'M4K 2'的值，但确实找到了它，它最终成为数据帧的第一个元素，更重要的是显示为数字或在这种情况下为邮政编码。这使我认为它可能是编码问题？即时消息不是很强。

编辑：

根据要求，IRS_zips的输出如下所示：

array([    0, 35004, 35005, ..., 83127, 83128, 83414])

这是medi_zips的输出数组：

array(['21502', '60201', '43623', ..., '81656', '56137', '85246'],
      dtype=object)

理想的输出将是带有匹配的zip的新数组，但这是我上面列出的错误

编辑2：

现在可以使用：

IRS_zips = AGI.zipcode.unique()
IRS_zips = (pd.to_numeric(IRS_zips, errors='coerce')).astype(int)

medi_zips = df.nppes_provider_zip.unique()
medi_int = pd.to_numeric(medi_zips, errors='coerce')
medi_int = (medi_int[~np.isnan(medi_int)]).astype(int)

Answer 1

这对我有用

import numpy as np
import pandas as pd

IRS_zips = np.array([0, 1, 2, 3, 4])
medi_zips = np.array(['0', '1', '2', '3', '4c'])

medi_int = pd.to_numeric(medi_zips, errors='coerce')

medi_int = medi_int[~np.isnan(medi_int)]

like_zips = np.intersect1d(IRS_zips, medi_int)

值错误转换数组中元素的数据类型

1 个答案: