我已经采用这种模式来获取博客文章的url链接(可以在我的网站url中用连字符或下划线等分隔,以使其与数据库匹配并显示相应的文章)。每当我将匹配项追加到列表中时,所有匹配项都是重新匹配的对象。如何获得匹配的单词?
我曾尝试使用搜索和匹配,但这些搜索并不会返回单独的单词。
import re
pattern = r"[a-zA-Z0-9]+[^-]+"
matches = re.finditer(pattern, "this-is-a-sample-post")
matches_lst = [i for i in matches]
因此,假设我有字符串“ this-is-a-sample-post”,我想获取“ this is a sample post”。
我想要一个匹配单词的列表,以便可以使用“” .join()方法并将字符串与我的数据库匹配。
答案 0 :(得分:1)
替换:
matches_lst = [i for i in matches]
使用:
matches_lst = [i.group(0) for i in matches]
或者您可以只使用findall
来列出您的列表:
matches = re.findall(pattern, "this-is-a-sample-post")
答案 1 :(得分:1)
import re
pattern = r"[a-zA-Z0-9]+[^-]+"
string = "this-is-a-sample-post"
matches = re.finditer(pattern, string)
matches_lst = [i.group(0) for i in matches]
print("Made with finditer:")
print(matches_lst)
print("Made with findall")
matches_lst = re.findall(pattern, string)
print(matches_lst)
print("Made with split")
print(string.split("-"))
print("Made with replace and split")
print(string.replace("-"," ").split())
输出:>>>
Made with finditer:
['this', 'is', 'sample', 'post']
Made with findall
['this', 'is', 'sample', 'post']
Made with split
['this', 'is', 'a', 'sample', 'post']
Made with replace and split
['this', 'is', 'a', 'sample', 'post']
>>>
答案 2 :(得分:0)
我的猜测是,如果我们希望捕获单词而不是破折号,我们可能还希望稍微修改问题中的表达式:
# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility
import re
regex = r"([a-zA-Z0-9]+)"
test_str = "this-is-a-sample-post"
matches = re.finditer(regex, test_str, re.MULTILINE)
for matchNum, match in enumerate(matches, start=1):
print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group()))
for groupNum in range(0, len(match.groups())):
groupNum = groupNum + 1
print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum)))
# Note: for Python 2.7 compatibility, use ur"" to prefix the regex and u"" to prefix the test string and substitution.
答案 3 :(得分:0)
正如评论中所建议的,re.sub
也是一种解决方案:
import re
s = 'this-is-example'
s = sub('-', ' ', s)
天真str.replace
也可以工作:
s = 'this-is-example'
s = s.replace('-', ' ')
答案 4 :(得分:0)
从当前的正则表达式模式(r“ [a-zA-Z0-9] + [^-] +”),它将仅获取“这是示例帖子”,而缺少“ a”。因为它正在寻找一个或多个字符。
要获得完整的句子,请将模式更改为
r'[a-zA-Z0-9]*[^-]'
您可以通过3种方式做到这一点:
>>> re.sub("-", " ", "this-is-a-sample-post")
O/P: 'this is a sample post'
>>> text = "this-is-a-sample-post"
>>> a = [m.group(0) for m in re.finditer(r'[a-zA-Z0-9]*[^-]', text)]
>>> " ".join(a)
o / p:“这是示例帖子”
str = "this-is-a-sample-post"
str.replace('-', ' ')
o / p:“这是示例帖子”