在某些情况下,如何检查字符串中的多个单词?

时间:2019-04-30 05:25:12

标签: python python-3.x

我再次需要Stack Overflow的明智建议。 我不确定标题是否正确显示了我现在想知道的内容。

事情就是这个。

有两组单词,我需要知道一个字符串在A组中是否有一个(或多个)单词,而在B组中也有一个单词。 像这样。

Group_A = ['nice','car','by','shop']
Group_B = ['no','thing','great']

t_string_A = 'there is a car over there'
t_string_B = 'no one is in a car'

t_string_A具有来自Group_A的“汽车”,而没有来自Group_B的汽车,因此它必须返回...我不知道,比方说0 而t_string_B在Group_A中具有“汽车”,在Group_B中具有“否”,因此它应返回1

实际上,我是通过某种原始方式来完成这项工作的。就像一堆代码一样

if 'nice' in t_string_A and 'no' in t_string_A:
    return 1

但是,正如您所知,随着A组或B组的长度增加,我应该制作过多组。这肯定不是有效的。

感谢您的帮助和关注:D 预先感谢!

5 个答案:

答案 0 :(得分:5)

您可以使用set s:

Group_A = set(('nice','car','by','shop'))
Group_B = set(('no','thing','great'))

t_string_A = 'there is a car over there'
t_string_B = 'no one is in a car'

set_A = set(t_string_A.split())
set_B = set(t_string_B.split())

def test(string):
    s = set(string.split())
    if Group_A & set_A and Group_B & set_A:
        return 1
    else:
        return 0

如果Group_AGroup_B中没有单词,结果将是什么?

根据您的短语,这种方式可能会提高测试效率:

def test(string):
    s = string.split()
    if any(word in Group_A for word in s) and any(word in Group_B for word in s):
        return 1
    else:
        return 0

答案 1 :(得分:1)

您可以使用itertools.product从给定组中生成所有可能的单词对。然后,您遍历字符串列表,如果字符串中存在一对,则结果为True,否则结果为False。

import itertools as it

Group_A = ['저는', '저희는', '우리는']
Group_B = ['입니다','라고 합니다']

strings = [ '저는 학생입니다.', '저희는 회사원들 입니다.' , '이 것이 현실 입니다.', '우리는 배고파요.' , '우리는 밴디스트라고 합니다.']

#Get all possible combinations of words from the group
z = list(it.product(Group_A, Group_B))

results = []

#Run through the list of string
for s in strings:
    flag = False
    for item in z:
        #If the word is present in the string, flag is True
        if item[0] in s and item[1] in s:
            flag = True
            break
    #Append result to results string
    results.append(flag)

print(results)

结果将看起来像

[True, True, False, False, True]

此外,下面的输入内容

Group_A = ['thing']
Group_B = ['car']
strings = ['there is a thing in a car', 'Nothing is in a car','Something happens to my car']

值将为[True, True, True]

答案 2 :(得分:1)

Group_A = ['nice','car','by','shop']
Group_B = ['no','thing','great']

from collections import defaultdict

group_a=defaultdict(int)
group_b=defaultdict(int)

for i in Group_A:
    group_a[i]=1

for i in Group_B:
    group_b[i]=1

t_string_A = 'there is a car over there'
t_string_B = 'no one is in a car'

def fun2(string):
    l=[]
    past=0
    for i in range(len(string)):
        if string[i]==' ':
            if string[past:i]!='':
                l.append(string[past:i])
            past=i+1
    return l

def fun(string,dic):
    for i in fun2(string):
   # for i in string.split():
        try:
            if dic[i]:
                return 1
        except:
            pass
    return 0

if fun(t_string_A,group_a)==fun(t_string_B,group_b):
    print(1)
else:
    print(0)

答案 3 :(得分:0)

这可以作为Aho Corasick algorithm

的变体有效地解决

这是一种高效的字典匹配算法,可在O(p + q + r)中同时定位文本中的模式,其中p =模式的长度,q =文本的长度,r =返回的匹配项的长度。

您可能想同时运行两个单独的状态机,并且需要对其进行修改,以便它们在第一个匹配项时终止。

我从this python implementation开始对修改进行了尝试

class AhoNode(object):
    def __init__(self):
        self.goto = {}
        self.is_match = False
        self.fail = None

def aho_create_forest(patterns):
    root = AhoNode()
    for path in patterns:
        node = root
        for symbol in path:
            node = node.goto.setdefault(symbol, AhoNode())
        node.is_match = True
    return root

def aho_create_statemachine(patterns):
    root = aho_create_forest(patterns)
    queue = []
    for node in root.goto.itervalues():
        queue.append(node)
        node.fail = root
    while queue:
        rnode = queue.pop(0)
        for key, unode in rnode.goto.iteritems():
            queue.append(unode)
            fnode = rnode.fail
            while fnode is not None and key not in fnode.goto:
                fnode = fnode.fail
            unode.fail = fnode.goto[key] if fnode else root
            unode.is_match = unode.is_match or unode.fail.is_match
    return root

def aho_any_match(s, root):
    node = root
    for i, c in enumerate(s):
        while node is not None and c not in node.goto:
            node = node.fail
        if node is None:
            node = root
            continue
        node = node.goto[c]
        if node.out:
            return True
    return False

def all_any_matcher(*pattern_lists):
    ''' Returns an efficient matcher function that takes a string
    and returns True if at least one pattern from each pattern list
    is found in it.
    '''
    machines = [aho_create_statemachine(patterns) for patterns in pattern_lists]

    def matcher(text):
        return all(aho_any_match(text, m) for m in machines)
    return matcher

并使用它

patterns_a = ['nice','car','by','shop']
patterns_b = ['no','thing','great']

matcher = all_any_matcher(patterns_a, patterns_b)

text_1 = 'there is a car over there'
text_2 = 'no one is in a car'
for text in (text_1, text_2):
    print '%r - %s' % (text, matcher(text))

显示

'there is a car over there' - False
'no one is in a car' - True

答案 4 :(得分:0)

您可以遍历单词,查看其中是否有in字符串:

from typing import List

def has_word(string: str, words: List[str]) -> bool:
    for word in words:
        if word in string:
            return True
    return False

可以轻松修改此功能,使其也具有has_all_words