Question

我的资源格式如下：

{"url": "http://res1.icourses.cn/share/process17//mp4/2017/3/17/6332c641-28b5-43a0-894c-972bd804f4e1_SD.mp4", "name": "1-课程导学"}, 
{"url": "http://res2.icourses.cn/share/process17//mp4/2017/3/17/a21902b6-8680-4bdf-8f47-4f99d1354475_SD.mp4", "name": "2-计算机网络的定义与分类"}

我想从网址中提取文件名6332c641-28b5-43a0-894c-972bd804f4e1_SD.mp4和a21902b6-8680-4bdf-8f47-4f99d1354475_SD.mp4。

如何编写正则表达式以匹配此位置的字符串？

Answer 1

根据您提供的字符串，您可以迭代字典，获取＆＃34; url＆＃34;并使用以下正则表达式

([^\/]*)$

说明：

() - defines capturing group
[^\/] - Match a single character not present after the ^
\/ - matches the character / literally (case sensitive)
* - Quantifier — Matches between zero and unlimited times, as many times as possible, giving back as needed (greedy)
$ - asserts position at the end of the string, or before the line terminator right at the end of the string (if any)

例如：

for record in records:
    print(re.search("([^\/]*)$", record['url']).group(1))

在这种情况下，我们正在利用文件名出现在字符串末尾的事实。使用$锚点使唯一有效的匹配终止字符串。

如果你想对一个字符串转换为字符串，你可以通过改变结束条件来。像([^\/]*?)\",一样。现在",终止匹配（请注意\以逃避"。请参阅https://regex101.com/r/k9VwC6/25

最后，如果我们没那么幸运，捕获组位于字符串的末尾（意味着我们无法使用$），我们可能会使用负面看法。您可以阅读here。

Answer 2

您可以使用re.findall：

import re
s = [{"url": "http://res1.icourses.cn/share/process17//mp4/2017/3/17/6332c641-28b5-43a0-894c-972bd804f4e1_SD.mp4", "name": "1-课程导学"}, {"url": "http://res2.icourses.cn/share/process17//mp4/2017/3/17/a21902b6-8680-4bdf-8f47-4f99d1354475_SD.mp4", "name": "2-计算机网络的定义与分类"}]
filenames = [re.findall('(?<=/)[\w\-\_]+\.mp4', i['url'])[0] for i in s]

输出：

['6332c641-28b5-43a0-894c-972bd804f4e1_SD.mp4', 'a21902b6-8680-4bdf-8f47-4f99d1354475_SD.mp4']

Answer 3

您可以使用简短的正则表达式[^/]*$

代码：

import re
s = [{"url": "http://res1.icourses.cn/share/process17//mp4/2017/3/17/6332c641-28b5-43a0-894c-972bd804f4e1_SD.mp4", "name": "1-课程导学"}, {"url": "http://res2.icourses.cn/share/process17//mp4/2017/3/17/a21902b6-8680-4bdf-8f47-4f99d1354475_SD.mp4", "name": "2-计算机网络的定义与分类"}]
filenames = [re.findall('[^/]*$', i['url'])[0] for i in s]
print(filenames)`

输出：

['6332c641-28b5-43a0-894c-972bd804f4e1_SD.mp4'，'a21902b6-8680-4bdf-8f47-4f99d1354475_SD.mp4']

检查正则表达式 - https://regex101.com/r/k9VwC6/30

使用Regex从URL中提取文件名 - 需要排除一些字符

3 个答案: