Question

我需要Python代码，它接受第x列中的文本并在y列上循环，并在Y中的每个值中搜索子字符串值x。我的示例如下。如果可能的话，我希望它能够在字典中打印匹配的值和名称，或者我将其转换为带有每列值的Pandas数据帧。我在这方面相当新，不断出错。我的代码和错误如下。

matches=['cat','bat','fat']
names=['turtle','bigcats','hfat1']

for x in matches:
    if name.str.contains(x) == 1:
    print(name)

ValueError：系列的真值是不明确的。使用a.empty，a.bool（），a.item（），a.any（）或a.all（）。

Answer 1

因为您将此问题标记为pandas：

import pandas as pd
import numpy as np

matches=['cat','bat','fat']
names=['turtle','bigcats','hfat1']

df = pd.DataFrame({'Name':names,'Matches':matches})
print(df)

启动数据帧：

  Matches     Name
0     cat   turtle
1     bat  bigcats
2     fat    hfat1

使用str访问contains和join创建的正则表达式：

df.loc[df.Name.str.contains('|'.join(df.Matches)),'Name'].tolist()

输出：

['bigcats', 'hfat1']

Answer 2

使用Numpy的find

from numpy.core.defchararray import find

matches = np.array(['cat', 'bat', 'fat'])
names = np.array(['turtle', 'bigcats', 'hfat1'])

i, j = np.where(find(names[:, None], matches) > -1)

print(matches[j], names[i], sep='\n')

['cat' 'fat']
['bigcats' 'hfat1']

包装在熊猫系列中

pd.Series(dict(zip(matches[j], names[i])))

cat    bigcats
fat      hfat1
dtype: object

Answer 3

我对你的问题有点不确定，但是这样做你想要的吗？

matches=['cat','bat','fat']
names=['turtle','bigcats','hfat1']

for x in matches:
    for name in names:
        if x in name:
            print(name)

请注意，如果您正在使用pandas.Series并执行series.str.contains(s)，则会检查s是否位于<{>每个值series中} - 这将为每个Series或True返回另一个False。这就是您收到错误的原因 - 您正在将Series与int进行比较，而{{1}}无效。

Answer 4

我猜你正在寻找这个？ “name”未定义，因为您的第二个数组是“names”，而您的“if”语句应如下所示，以便在数组中查找值：

matches=['cat','bat','fat']
names=['turtle','bigcats','hfat1']

for x in matches:
    if x in names:
        print(names)

Python将一列的字符串与另一列的子字符串匹配

4 个答案: