Question

我正在尝试加载网站并使用python读取他们的可见文本，但我列表中的某些网站无法正确加载，因为他们没有成功重定向到主网页。例如url imfuna.com应该重定向到imfuna.com/home-uk/，但它没有，因此我的代码只检索6个单词，而不是64个单词。

import requests
from bs4 import BeautifulSoup

# error handling

from requests.packages.urllib3.exceptions import InsecureRequestWarning
requests.packages.urllib3.disable_warnings(InsecureRequestWarning)

# settings

headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36'}

url = "http://imfuna.com"

response = requests.get(url, headers=headers, verify=False)

soup = BeautifulSoup(response.text, "lxml")

for script in soup(["script", "style"]):
    script.extract()
text = soup.get_text()
lines = (line.strip() for line in text.splitlines())
chunks = (phrase.strip() for line in lines for phrase in line.split("  "))
text = '\n'.join(chunk for chunk in chunks if chunk)

front_text_count = len(text.split(" "))
print front_text_count
print text

如果你运行这个，你只能得到6个字：

6
Imfuna Property Inventory and Inspection Apps

但实际上你应该得到64（浏览器重定向到http://imfuna.com/home-uk/并会在那里看到内容。）

任何人都知道我如何设置允许重定向的请求，而是在http://imfuna.com/home-uk/

解析页面

谢谢：）

允许使用Python重定向请求以读取网页

0 个答案: