Question

我在这里有这个字符串：

"['\r\n                    File: FLO_JIUWASOKLDM_05_HetR_IUSJA_&_Cracks.mp4', <br/>, '\r\n                    Size: 48.14 MB                ']"

我有这个正则表达式\w+\.\w+

我希望正则表达式获取文件名FLO_JIUWASOKLDM_05_HetR_IUSJA_&_Cracks.mp4

但是它在“＆”符号处中断，返回_Cracks.mp4，我需要怎么做才能解决？我是Regex的新手。

Answer 1

这里有很多选项可供选择，例如：

([^\s]+\.[a-z][a-z0-9]+)

Demo

测试

# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility

import re

regex = r"([^\s]+\.[a-z][a-z0-9]+)"

test_str = "\"['\\r\\n                    File: FLO_JIUWASOKLDM_05_HetR_IUSJA_&_Cracks.mp4', <br/>, '\\r\\n                    Size: 48.14 MB                ']\"
"

matches = re.finditer(regex, test_str, re.MULTILINE | re.IGNORECASE)

for matchNum, match in enumerate(matches, start=1):

    print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group()))

    for groupNum in range(0, len(match.groups())):
        groupNum = groupNum + 1

        print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum)))

# Note: for Python 2.7 compatibility, use ur"" to prefix the regex and u"" to prefix the test string and substitution.

Answer 2

\w是“单词字符”的简写，表示字母，数字和下划线。请注意缺少＆符。要包含“＆”号，可以使用字符类[\w&]。那么您的正则表达式将是

[\w&]+\.\w+

顺便说一句，根据您使用的正则表达式函数，它也可能与48.14相匹配。

但是也许您想要包含更多的字符，而不仅仅是＆号。所有非空白字符如何？

\S+\.\w+

这使用了\S，它是空白速记\s的反转。

Answer 3

您可以利用上下文：不必知道文件名可能包含哪些字符（请注意，它甚至通常通常包含空格）：您知道它在File:和空格之后开始，一直到{ {1}}。

因此，您可以使用所需的功能来实现

请参见online Python demo。

另请参阅regex demo和regex graph：

详细信息

m = re.search(r"File:\s*([^']+)", s) if m: print(m.group(1))-文字子字符串
File:-超过0个空格
\s*-捕获组1（([^']+)）：除match_object.group(1)以外的1个或更多字符。

我如何使用正则表达式从字符串中获取文件名

3 个答案:

Demo

测试