Question

我有一个字符串，例如：

"This is my very boring string"

此外，我在字符串中没有空格的位置有一个char。

例如：

位置13，在此示例中与单词o中的boring匹配。

我需要的是，根据我得到的索引（13）返回单词（无聊）。

此代码将返回char（o）：

re.findall('[a-z]',s)[13]

但是由于某种原因，我认为没有一种好方法可以使无聊这个词返回。

任何帮助将不胜感激。

Answer 1

您可以使用正则表达式\w+来匹配单词，并不断累积匹配的长度，直到总长度超过目标位置为止：

def get_word_at(string, position):
    length = 0
    for word in re.findall(r'\w+', string):
        length += len(word)
        if length > position:
            return word

以便get_word_at('This is my very boring string', 13)将返回：

boring

Answer 2

不要在后面变长，这很慢又难看。
与捕获组一起使用简单的前瞻功能将获得成功。

此正则表达式使用非空格作为字符。

^(?:\s*(?=(?<!\S)(\S+))?\S){13}

demo 13th char

在需要时使用单词，但字符要求的单词必须
与反字符一起使用，否则将无济于事，
它会因为匹配 ALL 个字符而停止。

示例：

\ w与\ W
与\ S一起使用

demo 1st char

demo 18th char

Answer 3

一个非正则表达式的解决方案，旨在追求OP所期望的优雅：

def word_out_of_string(string, character_index):
    words = string.split()

    while words and character_index >= len(words[0]):
        character_index -= len(words.pop(0))

    return words.pop(0) if words else None

print(word_out_of_string("This is my very boring string", 13))

Answer 4

此函数将接受两个参数：字符串和索引。

它将把索引转换为与原始字符串相同的索引。

然后，它将返回原始字符串中转换后的索引的字符所属的单词。

def find(string,idx):
    # Find the index of the character relative original string
    i1 = idx
    for char in string:
        if char == ' ':
            i1 += 1
        if string[i1] == string.replace(' ','')[idx]:
            break

    # Find which word the index belongs to in the original string
    i2 = 0
    for word in string.split():
        for l in word:
            i2 += 1
            if i2 == i1:
                return(word)
        i2+=1

print(find("This is my very boring string", 13))

输出：

boring

Answer 5

您可以安装和使用regex模块，该模块支持具有可变长度后视的模式，因此您可以使用这种模式来断言确实有所需数量的单词字符，并可选地用白色包围空格，在匹配词后面：

import regex
regex.search(r'\w*(?<=^\s*(\w\s*){13})\w+', 'This is my very boring string').group()

这将返回：

boring

Answer 6

如果使用Python的替代正则表达式引擎，则可以用空字符串替换以下正则表达式的匹配项：

r'^(?:\s*\S){0,13}\s|(?<=(?:\s*\S){13,})\s.*'

Regex demo _{^{<< / sup>¯\ _（ツ）_ /¯^>}} Python demo

对于示例字符串，在删除空格后，'o'中的'boring'在索引13处。如果正则表达式中的两个13都更改为12-17范围内的任何数字，则返回'boring'。如果将它们更改为12，则返回'very'；如果将它们更改为18，则返回''string'。

正则表达式引擎执行以下操作。

^            : match beginning of string
(?:\s*\S)    : match 0+ ws chars, then 1 non-ws char, in a non-capture group
{0,13}       : execute the non-capture group 0-13 times 
\s           : match a ws char
|            : or
(?<=         : begin a positive lookbehind
  (?:\s*\S)  : match 0+ ws chars, then 1 non-ws char, in a non-capture group 
  {13,}      : execute the non-capture group at least 13 times
)            : end positive lookahead
\s           : match 1 ws char
.*           : match 0+ chars

根据字符串的位置获取字符串中的字符字

6 个答案: