如何在字符串中搜索没有空格的单词

时间:2019-12-01 09:51:10

标签: python regex python-3.x

我正在尝试找出如何读取没有空格的名称的字符串 例如robbybobby我希望它搜索字符串并将它们分成自己的组

def wordcount(filename, listwords):
    try:
        file = open(filename, "r")
        read = file.readline()
        file.close()
        for word in listwords:
            lower = word.lower()
            count = 0
            for letter in read:
                line = letter.split()
                for each in line:
                    line2 = each.lower()
                    line2 = line2.strip(".")
                    if lower == line2:
                        count += 1

            print(lower, ":", count)
    except FileExistsError:
        print("no")
wordcount("teststring.txt", ["robby"])

使用此代码,只有在以后有空格的情况下,它才会发现罗比

2 个答案:

答案 0 :(得分:0)

有几种方法可以做到这一点。我发布了2条建议,以便您可以理解和改进:)

解决方案1:

def count_occurrences(line, word):
    # Normalize vars
    word = word.lower()
    line = line.lower()

    # Initialize vars
    start_index = 0
    total_count = 0
    word_len = len(word)

    # Count ignoring empty spaces
    while start_index >= 0:
        # Ignore if not found
        if word not in line[start_index:]:
            break

        # Search for the word starting from <start_index> index
        start_index = line.index(word, start_index)

        # Increment if found
        if start_index >= 0:
            start_index += word_len
            total_count += 1    

    # Return total occurrences
    return total_count

print(count_occurrences('stackoverflow overflow overflowABC over', 'overflow'))

输出:3

解决方案2:

如果您想使用正则表达式,此链接可能会有用:

  1. Count the occurrence of a word in a txt file in python

  2. Exact match for words

答案 1 :(得分:0)

您想要对IIUC进行计数,而不考虑它是作为其他单词的一部分还是单独出现。

您可以为此使用简单的正则表达式:

import re

def count_line(dict, line, words):
    for word in words:
        dict[word]=len(re.findall(word, line, re.IGNORECASE))+dict.get(word, 0)
    return dict

allLines="""
bobby robbubobby yo xyz\n
robson bobbyrobin abc\n
xyz bob amy oo\n
amybobson robson
"""

print(allLines)

words=["amy", "robby", "bobby", "jack"]
res={}
for line in allLines.split("\n"):
    res=count_line(res, line, words)
print(res)

输出:

bobby robbubobby yo xyz

robson bobbyrobin abc

xyz bob amy oo

amybobson robson

{'amy': 2, 'robby': 0, 'bobby': 3, 'jack': 0}