使用finditer后如何从正则表达式匹配对象中获取匹配的单词

时间:2019-06-10 03:08:51

标签: python regex

我已经采用这种模式来获取博客文章的url链接(可以在我的网站url中用连字符或下划线等分隔,以使其与数据库匹配并显示相应的文章)。每当我将匹配项追加到列表中时,所有匹配项都是重新匹配的对象。如何获得匹配的单词?

我曾尝试使用搜索和匹配,但这些搜索并不会返回单独的单词。

import re
pattern = r"[a-zA-Z0-9]+[^-]+"
matches = re.finditer(pattern, "this-is-a-sample-post")
matches_lst = [i for i in matches]

因此,假设我有字符串“ this-is-a-sample-post”,我想获取“ this is a sample post”。

我想要一个匹配单词的列表,以便可以使用“” .join()方法并将字符串与我的数据库匹配。

5 个答案:

答案 0 :(得分:1)

替换:

matches_lst = [i for i in matches]

使用:

matches_lst = [i.group(0) for i in matches]

或者您可以只使用findall来列出您的列表:

matches = re.findall(pattern, "this-is-a-sample-post")

答案 1 :(得分:1)

import re
pattern = r"[a-zA-Z0-9]+[^-]+"
string = "this-is-a-sample-post"
matches = re.finditer(pattern, string)
matches_lst = [i.group(0) for i in matches]
print("Made with finditer:")
print(matches_lst)
print("Made with findall")
matches_lst = re.findall(pattern, string)
print(matches_lst)
print("Made with split")
print(string.split("-"))
print("Made with replace and split")
print(string.replace("-"," ").split())

输出:>>>

Made with finditer:
['this', 'is', 'sample', 'post']
Made with findall
['this', 'is', 'sample', 'post']
Made with split
['this', 'is', 'a', 'sample', 'post']
Made with replace and split
['this', 'is', 'a', 'sample', 'post']
>>> 

答案 2 :(得分:0)

我的猜测是,如果我们希望捕获单词而不是破折号,我们可能还希望稍微修改问题中的表达式:

Demo

测试

# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility

import re

regex = r"([a-zA-Z0-9]+)"

test_str = "this-is-a-sample-post"

matches = re.finditer(regex, test_str, re.MULTILINE)

for matchNum, match in enumerate(matches, start=1):

    print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group()))

    for groupNum in range(0, len(match.groups())):
        groupNum = groupNum + 1

        print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum)))

# Note: for Python 2.7 compatibility, use ur"" to prefix the regex and u"" to prefix the test string and substitution.

答案 3 :(得分:0)

正如评论中所建议的,re.sub也是一种解决方案:

import re

s = 'this-is-example'
s = sub('-', ' ', s)

天真str.replace也可以工作:

s = 'this-is-example'
s = s.replace('-', ' ')

答案 4 :(得分:0)

从当前的正则表达式模式(r“ [a-zA-Z0-9] + [^-] +”),它将仅获取“这是示例帖子”,而缺少“ a”。因为它正在寻找一个或多个字符。

要获得完整的句子,请将模式更改为

r'[a-zA-Z0-9]*[^-]'

您可以通过3种方式做到这一点:

  1. 使用re.sub将“-”替换为“”(空格)
>>> re.sub("-", " ", "this-is-a-sample-post")

O/P: 'this is a sample post'
  1. 将finditer()的输出提取到列表中并进行连接。
>>> text = "this-is-a-sample-post"
>>> a = [m.group(0) for m in re.finditer(r'[a-zA-Z0-9]*[^-]', text)]
>>> " ".join(a)

o / p:“这是示例帖子”

  1. 将输出提取到字符串中,并将'-'替换为空格
str = "this-is-a-sample-post"
str.replace('-', ' ')

o / p:“这是示例帖子”