Question

我已经采用这种模式来获取博客文章的url链接（可以在我的网站url中用连字符或下划线等分隔，以使其与数据库匹配并显示相应的文章）。每当我将匹配项追加到列表中时，所有匹配项都是重新匹配的对象。如何获得匹配的单词？

我曾尝试使用搜索和匹配，但这些搜索并不会返回单独的单词。

import re
pattern = r"[a-zA-Z0-9]+[^-]+"
matches = re.finditer(pattern, "this-is-a-sample-post")
matches_lst = [i for i in matches]

因此，假设我有字符串“ this-is-a-sample-post”，我想获取“ this is a sample post”。

我想要一个匹配单词的列表，以便可以使用“” .join（）方法并将字符串与我的数据库匹配。

Answer 1

替换：

matches_lst = [i for i in matches]

使用：

matches_lst = [i.group(0) for i in matches]

或者您可以只使用findall来列出您的列表：

matches = re.findall(pattern, "this-is-a-sample-post")

Answer 2

import re
pattern = r"[a-zA-Z0-9]+[^-]+"
string = "this-is-a-sample-post"
matches = re.finditer(pattern, string)
matches_lst = [i.group(0) for i in matches]
print("Made with finditer:")
print(matches_lst)
print("Made with findall")
matches_lst = re.findall(pattern, string)
print(matches_lst)
print("Made with split")
print(string.split("-"))
print("Made with replace and split")
print(string.replace("-"," ").split())

输出：>>>

Made with finditer:
['this', 'is', 'sample', 'post']
Made with findall
['this', 'is', 'sample', 'post']
Made with split
['this', 'is', 'a', 'sample', 'post']
Made with replace and split
['this', 'is', 'a', 'sample', 'post']
>>>

Answer 3

我的猜测是，如果我们希望捕获单词而不是破折号，我们可能还希望稍微修改问题中的表达式：

Demo

测试

# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility

import re

regex = r"([a-zA-Z0-9]+)"

test_str = "this-is-a-sample-post"

matches = re.finditer(regex, test_str, re.MULTILINE)

for matchNum, match in enumerate(matches, start=1):

    print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group()))

    for groupNum in range(0, len(match.groups())):
        groupNum = groupNum + 1

        print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum)))

# Note: for Python 2.7 compatibility, use ur"" to prefix the regex and u"" to prefix the test string and substitution.

Answer 4

正如评论中所建议的，re.sub也是一种解决方案：

import re

s = 'this-is-example'
s = sub('-', ' ', s)

天真str.replace也可以工作：

s = 'this-is-example'
s = s.replace('-', ' ')

Answer 5

从当前的正则表达式模式（r“ [a-zA-Z0-9] + [^-] +”），它将仅获取“这是示例帖子”，而缺少“ a”。因为它正在寻找一个或多个字符。

要获得完整的句子，请将模式更改为

r'[a-zA-Z0-9]*[^-]'

您可以通过3种方式做到这一点：

使用re.sub将“-”替换为“”（空格）

>>> re.sub("-", " ", "this-is-a-sample-post")

O/P: 'this is a sample post'

将finditer（）的输出提取到列表中并进行连接。

>>> text = "this-is-a-sample-post"
>>> a = [m.group(0) for m in re.finditer(r'[a-zA-Z0-9]*[^-]', text)]
>>> " ".join(a)

o / p：“这是示例帖子”

将输出提取到字符串中，并将'-'替换为空格

str = "this-is-a-sample-post"
str.replace('-', ' ')

o / p：“这是示例帖子”

使用finditer后如何从正则表达式匹配对象中获取匹配的单词

5 个答案:

Demo

测试