我知道这可以使用PHP的parse_url
和parse_str
函数轻松完成:
$subject = "http://www.youtube.com/watch?v=z_AbfPXTKms&NR=1";
$url = parse_url($subject);
parse_str($url['query'], $query);
var_dump($query);
但如何使用Python实现这一目标?我可以做urlparse
,但下一步是什么?
答案 0 :(得分:44)
我创建了没有regexp的youtube id解析器:
def video_id(value):
"""
Examples:
- http://youtu.be/SA2iWivDJiE
- http://www.youtube.com/watch?v=_oPAwA_Udwc&feature=feedu
- http://www.youtube.com/embed/SA2iWivDJiE
- http://www.youtube.com/v/SA2iWivDJiE?version=3&hl=en_US
"""
query = urlparse(value)
if query.hostname == 'youtu.be':
return query.path[1:]
if query.hostname in ('www.youtube.com', 'youtube.com'):
if query.path == '/watch':
p = parse_qs(query.query)
return p['v'][0]
if query.path[:7] == '/embed/':
return query.path.split('/')[2]
if query.path[:3] == '/v/':
return query.path.split('/')[2]
# fail?
return None
答案 1 :(得分:41)
Python有a library for parsing URLs。
import urlparse
url_data = urlparse.urlparse("http://www.youtube.com/watch?v=z_AbfPXTKms&NR=1")
query = urlparse.parse_qs(url_data.query)
video = query["v"][0]
答案 2 :(得分:9)
以下是RegExp,它涵盖了这些案例
((?<=(v|V)/)|(?<=be/)|(?<=(\?|\&)v=)|(?<=embed/))([\w-]+)
答案 3 :(得分:4)
match = re.search(r"youtube\.com/.*v=([^&]*)", "http://www.youtube.com/watch?v=z_AbfPXTKms&test=123")
if match:
result = match.group(1)
else:
result = ""
未测试。
答案 4 :(得分:2)
不需要正则表达式。在?
上拆分,取第二个,在=
上拆分,取第二个,在&
上拆分,取第一个。
答案 5 :(得分:2)
您可以尝试使用正则表达式获取YouTube视频ID:
# regex for the YouTube ID: "^[^v]+v=(.{11}).*"
result = re.match('^[^v]+v=(.{11}).*', url)
print result.group(1)
答案 6 :(得分:1)
您可以使用
from urllib.parse import urlparse
url_data = urlparse("https://www.youtube.com/watch?v=RG9TMn1FJzc")
print(url_data.query[2::])
答案 7 :(得分:1)
当这些参数可以按任何顺序出现时,拆分字符串是一个非常糟糕的主意。坚持使用 urlparse:
from urllib.parse import parse_qs, urlparse
vid = parse_qs(urlparse(url).query).get('v')
答案 8 :(得分:0)
我使Mikhail Kashkin的解决方案对Python3友好
from urllib.parse import urlparse
def video_id(url):
"""
Examples:
- http://youtu.be/SA2iWivDJiE
- http://www.youtube.com/watch?v=_oPAwA_Udwc&feature=feedu
- http://www.youtube.com/embed/SA2iWivDJiE
- http://www.youtube.com/v/SA2iWivDJiE?version=3&hl=en_US
"""
o = urlparse(url)
if o.netloc == 'youtu.be':
return o.path[1:]
elif o.netloc in ('www.youtube.com', 'youtube.com'):
if o.path == '/watch':
id_index = o.query.index('v=')
return o.query[id_index+2:id_index+13]
elif o.path[:7] == '/embed/':
return o.path.split('/')[2]
elif o.path[:3] == '/v/':
return o.path.split('/')[2]
return None # fail?
答案 9 :(得分:0)
尽管这将进行搜索查询,但会给您id
:
from youtube_search import YoutubeSearch
results = YoutubeSearch('search terms', max_results=10).to_json()
print(results)
答案 10 :(得分:0)
url = "http://www.youtube.com/watch?v=z_AbfPXTKms&NR=1"
parsed = url.split("?")
videoId = parsed[1]
print(videoId)
这将适用于各种YouTube视频链接。
答案 11 :(得分:0)
我使用这个很棒的包 pytube
。$ pip install pytube
#Examples
url1='http://youtu.be/SA2iWivDJiE'
url2='http://www.youtube.com/watch?v=_oPAwA_Udwc&feature=feedu'
url3='http://www.youtube.com/embed/SA2iWivDJiE'
url4='http://www.youtube.com/v/SA2iWivDJiE?version=3&hl=en_US'
url5='https://www.youtube.com/watch?v=rTHlyTphWP0&index=6&list=PLjeDyYvG6-40qawYNR4juzvSOg-ezZ2a6'
url6='youtube.com/watch?v=_lOT2p_FCvA'
url7='youtu.be/watch?v=_lOT2p_FCvA'
url8='https://www.youtube.com/watch?time_continue=9&v=n0g-Y0oo5Qs&feature=emb_logo'
urls=[url1,url2,url3,url4,url5,url6,url7,url8]
#Get youtube id
from pytube import extract
for url in urls:
id=extract.video_id(url)
print(id)
输出
SA2iWivDJiE
_oPAwA_Udwc
SA2iWivDJiE
SA2iWivDJiE
rTHlyTphWP0
_lOT2p_FCvA
_lOT2p_FCvA
n0g-Y0oo5Qs
答案 12 :(得分:0)
我迟到了,但我使用这个片段来获取视频 ID。
def video_id(url: str) -> str:
"""Extract the ``video_id`` from a YouTube url.
This function supports the following patterns:
- :samp:`https://youtube.com/watch?v={video_id}`
- :samp:`https://youtube.com/embed/{video_id}`
- :samp:`https://youtu.be/{video_id}`
:param str url:
A YouTube url containing a video id.
:rtype: str
:returns:
YouTube video id.
"""
return regex_search(r"(?:v=|\/)([0-9A-Za-z_-]{11}).*", url, group=1)
def regex_search(pattern: str, string: str, group: int):
"""Shortcut method to search a string for a given pattern.
:param str pattern:
A regular expression pattern.
:param str string:
A target string to search.
:param int group:
Index of group to return.
:rtype:
str or tuple
:returns:
Substring pattern matches.
"""
regex = re.compile(pattern)
results = regex.search(string)
if not results:
return False
return results.group(group)