Question

我一直在使用以下功能来获取YouTube搜索结果：

from urllib  import urlencode
from urllib2 import urlopen

def fetch(search_query):
    url = 'http://www.youtube.com/results?'
    args = urlencode({'search_query':search_query})
    conn = urlopen(url,args)
    data = conn.read()
    conn.close()
    return data

最近它开始在某些情况下返回“空结果”，迫使我改变我的代码：

from urllib  import urlencode
from urllib2 import urlopen

def fetch(search_query):
    url = 'http://www.youtube.com/results?'
    args = urlencode({'search_query':search_query})
    while True:
        conn = urlopen(url,args)
        data = conn.read()
        conn.close()
        if 'results?' in data:
            break
    return data

如您所见，我使用'results?'来区分有效和无效的搜索结果。

在检索到的HTML的开头会出现另一个值得注意的差异（有很多）：

有效结果：yt.www.masthead.sizing.runBeforeBodyIsReady(true,true,false);
结果无效：yt.www.masthead.sizing.runBeforeBodyIsReady(true,true,true);

我使用conn.get_code()来验证HTTP响应代码始终为200。

是否有人知道YouTube最近可能导致此更改？

由于

Answer 1

事实证明问题出在http://www.youtube.com/results?。

此网址正被重定向到https://www.youtube.com/results?。

如果未进行重定向，则搜索结果为“空”。

我可以使用conn.get_url()验证这一点：

当它返回原始网址（以http开头的网址）时，结果无效。
当它返回重定向的网址（以https开头的网址）时，结果有效。

通过HTTP连接获取YouTube搜索结果

1 个答案: