在文本中找到给定位置的单词

时间:2010-09-13 17:37:36

标签: python regex

有更优雅的(pythonic +有效)方法在给定位置找到单词?

FIRST_WORD = re.compile(r'^(\w+)', re.UNICODE) 
LAST_WORD = re.compile(r'(\w+)$', re.UNICODE)

def _get_word(self, text, position):
    """
    Get word on given position
    """
    assert position >= 0
    assert position < len(text)

    # get second part of word
    # slice string and get first word
    match = FIRST_WORD.search(text[position:])
    assert match is not None
    postfix = match.group(1)

    # get first part of word, can be empty
    # slice text and get last word
    match2 = LAST_WORD.search(text[:position])
    if match2 : prefix = match2.group(1)
    else : prefix = ''

    return prefix + postfix


#                                  | 21.
>>> _get_word("Hello, my name is Earl.", 21)
Earl
>>> _get_word("Hello, my name is Earl.", 20)
Earl

由于

3 个答案:

答案 0 :(得分:1)

我是这样做的:

s = "Hello, my name is Earl."
def get_word(text, position):
    words = text.split()
    characters = -1
    for word in words:
        characters += len(word)
        if characters > = position:
            return word
>>> get_word(s, 21)
Earl.

可以使用''.strip()或正则表达式或类似黑客的内容来删除标点符号

for c in word:
    final += c if c.lower() in 'abcdefghijklmnopqrstuvwxyz'

答案 1 :(得分:0)

import string

s = "Hello, my name is Earl."
def get_word(text, position):
    _, _, start = text[:position].rpartition(' ')
    word,_,_ = text[position:].partition(' ')
    return start+word

print get_word(s, 21).strip(string.punctuation)

答案 2 :(得分:0)

以下解决方案是获取给定位置周围的字母字符:

def get_word(text, position):
    if position < 0 or position >= len(text):
        return ''

    str_list = []

    i = position
    while text[i].isalpha():
        str_list.insert(0, text[i])
        i -= 1

    i = position + 1
    while text[i].isalpha():
        str_list.append(text[i])
        i += 1

    return ''.join(str_list)

以下是测试用例:

get_word("Hello, my name is Earl.", 21)  # 'Earl'
get_word("Hello, my name is Earl.", 20)  # 'Earl'

我认为将文本拆分为具有split功能的单词并不是一个好主意,因为位置对于此问题至关重要。如果文本中有连续的空白,split功能可能会导致麻烦。