Question

我的CSV有7列：

low low 5more   more    big high    vgood
vhigh   vhigh   2   2   small   low unacc
vhigh   vhigh   2   2   small   med unacc
vhigh   vhigh   2   2   small   high    unacc
vhigh   vhigh   2   2   med low unacc
vhigh   vhigh   2   2   med med unacc
vhigh   vhigh   2   2   med high    unacc

我需要在high，vhigh和0列中搜索值1或5。我不确定NumPy各种搜索功能如何实现这一点（我需要使用NumPy进行此搜索）。

有人可以帮忙吗？非常感激。

Answer 1

如果您显示的是文件，则可以使用

加载

In [259]: arr = np.genfromtxt('tmp.csv', names=True, dtype=None)

In [260]: arr
Out[260]: 
array([('vhigh', 'vhigh', 2, 2, 'small',  'low', 'unacc'),
       ('vhigh', 'vhigh', 2, 2, 'small',  'med', 'unacc'),
       ('vhigh', 'vhigh', 2, 2, 'small', 'high', 'unacc'),
       ('vhigh', 'vhigh', 2, 2,   'med',  'low', 'unacc'),
       ('vhigh', 'vhigh', 2, 2,   'med',  'med', 'unacc'),
       ('vhigh', 'vhigh', 2, 2,   'med', 'high', 'unacc')], 
      dtype=[('low', 'S5'), ('low_1', 'S5'), ('5more', '<i8'), ('more', '<i8'), ('big', 'S5'), ('high', 'S4'), ('vgood', 'S5')])

要“搜索”，有一些解释。对于所有这些，我们希望一次查看一列。让我们看一下5列（左起第六个，在顶行标记为high，我假设它是列的标题）。它看起来像这样：

In [268]: arr['high']
Out[268]: 
array(['low', 'med', 'high', 'low', 'med', 'high'], 
      dtype='|S4')

您可以通过直接比较查看'high'列'high'的哪些行作为其值：

In [269]: arr['high'] == 'high'
Out[269]: array([False, False,  True, False, False,  True], dtype=bool)

您可以使用where：

查看此指数

In [270]: np.where(arr['high'] == 'high')
Out[270]: (array([2, 5]),)

或者您可以获取'high'行中包含'high'的行：

In [271]: arr[arr['high'] == 'high']
Out[271]: 
array([('vhigh', 'vhigh', 2, 2, 'small', 'high', 'unacc'),
       ('vhigh', 'vhigh', 2, 2, 'med', 'high', 'unacc')], 
      dtype=[('low', 'S5'), ('low_1', 'S5'), ('5more', '<i8'), ('more', '<i8'), ('big', 'S5'), ('high', 'S4'), ('vgood', 'S5')])

如果您想同时搜索'vhigh'和'high'，可以使用np.char.endswith（或np.char.count，如果它不一定是结尾），这将获得之一：

In [272]: np.char.endswith(arr['low'], 'high')
Out[272]: array([ True,  True,  True,  True,  True,  True], dtype=bool)

In [273]: np.char.endswith(arr['high'], 'high')
Out[273]: array([False, False,  True, False, False,  True], dtype=bool)

要将它们放在一起，您可以检查哪些行包含以下三个：

In [290]: np.all([arr['low'] == 'vhigh', arr['low_1'] == 'vhigh', arr['high'] == 'high'], 0)
Out[290]: array([False, False,  True, False, False,  True], dtype=bool)

由于您不再拥有整数列5more和more，因此您可以创建一个普通的字符串数组：

In [293]: b = np.column_stack([arr['low'], arr['low_1'], arr['high']])

In [294]: b
Out[294]: 
array([['vhigh', 'vhigh', 'low'],
       ['vhigh', 'vhigh', 'med'],
       ['vhigh', 'vhigh', 'high'],
       ['vhigh', 'vhigh', 'low'],
       ['vhigh', 'vhigh', 'med'],
       ['vhigh', 'vhigh', 'high']], 
      dtype='|S5')

In [295]: np.char.endswith(b, 'high')
Out[295]: 
array([[ True,  True, False],
       [ True,  True, False],
       [ True,  True,  True],
       [ True,  True, False],
       [ True,  True, False],
       [ True,  True,  True]], dtype=bool)

In [297]: np.all(np.char.endswith(b, 'high'), 1)
Out[297]: array([False, False,  True, False, False,  True], dtype=bool)

Python - 在CSV中的多个列中进行Numpy搜索

1 个答案: