计算某些文本中多字子串的出现次数

时间:2021-01-27 22:17:41

标签: python regex string text nlp

因此对于某些文本中的单个单词子串计数,我可以使用 some_text.split().count(single_word_substring)。对于某些文本中的多字子串计数,我该如何做到这一点?

示例:

text = 'he is going to school. abc is going to school. xyz is going to school.'
to_be_found = 'going to school'

计数应为 3。

text = 'he is going to school. abc is going to school. xyz is going to school.'
to_be_found = 'going to'

计数应为 3。

text = 'he is going to school. abc is going to school. xyz is going to school.'
to_be_found = 'go'

计数应为 0。

text = 'he is going to school. abc-xyz is going to school. xyz is going to school.'
to_be_found = 'school'

计数应为 3。

text = 'he is going to school. abc-xyz is going to school. xyz is going to school.'
to_be_found = 'abc-xyz'

计数应为 1。

假设 1: 一切都是小写的。 假设 2: 文本可以包含任何内容。 假设 3: 要找到的内容也可以包含任何内容。例如,car with 4 passengersxyz & abc

注意:可以接受基于 REGEX 的解决方案。我只是好奇是否可以不使用正则表达式(很高兴拥有并且仅供将来可能对此感兴趣的其他人)。

4 个答案:

答案 0 :(得分:1)

这是一个使用正则表达式的可行解决方案:

import re

def occurrences(text,to_be_found):
    return len(re.findall(rf'\W{to_be_found}\W', text))

正则表达式中的大写 W 用于非单词字符,包括空格和其他标点符号。

答案 1 :(得分:0)

  1. 搜索子字符串的最佳原生方式仍然是计数。它可以根据需要与多字子串一起使用

    text = 'he is going to school. abc is going to school. xyz is going to school.'
    text.count('going to school') # 3
    text.count('going to') # 3
    text.count('school') # 3
    text.count('go') # 3
    

    对于 case 'go' 如果你需要 0 你可以搜索 'go'、'go' 或 'go' 来捕捉单独的单词

  2. 您也可以编写自己的方法来按字符搜索 https://stackoverflow.com/a/30863956/15080484

答案 2 :(得分:0)

你试试这个:

def deep_reverse(L):
    L[:] = L[::-1]
    for i in range (len(L)-1,-1,-1):
        L[i] = L[i][::-1]


L = [[0, 1, 2], [1, 2, 3], [3, 2, 1], [10, -10, 100]]
deep_reverse(L) 
print(L) # [[100, -10, 10], [1, 2, 3], [3, 2, 1], [2, 1, 0]]

答案 3 :(得分:0)

设法使其与此代码一起工作(但它根本不是 Pythonic 方式):

text = 'he is going to school. abc is going to school. xyz is going to school.'
to_be_found = 'going to school'

def find_occurences(text, look_for):
    spec = [',','.','!','?']
    where = 0
    how_many = 0

    if not to_be_found in text:
        return how_many

    while True:
        i = text.find(look_for, where)

        if i != -1: #We have a match
            if (((text[i-1] == " ") and (text[i + len(look_for)] == " ")) #Check if the text is really alone
            or (((text[i-1] in spec) or ((text[i-1] == " "))) and (text[i + len(look_for)] in spec))): #Check if it is not surrounded by special characters such as ,.!?

                where = i + len(look_for)
                how_many += 1
            else:
                where = i + len(look_for)
        else:
            break
    
    return how_many

print("'{}' was in '{}' this many times: {}".format(to_be_found, text, find_occurences(text, to_be_found)))
  1. 第一个条件:(text[i-1] == " ") and (text[i + len(look_for)] == " ") 检查子字符串是否没有被空格包围。
  2. 第二个条件:((text[i-1] in spec) or ((text[i-1] == " "))) and (text[i + len(look_for)] in spec)) 检查子字符串是否没有被左侧的任何特殊字符和空格包围。

示例 1:

to_be_found = 'going to school'
Output1: 3

示例 2:

to_be_found = 'going to'
Output2: 3

示例 3:

to_be_found = 'go'
Output3: 0

示例 4:

to_be_found = 'school'
Output4: 3
相关问题