Question

如何获取网站源代码的第x行？

我需要这样的功能：

def source_code（URL，line）：                ...

Answer 1

使用请求模块

import requests as req
url = '"http://www.something.com"'
resp = req.get(url)
print(resp.text) # html response

Answer 2

python中有一个标准库模块： urllib2 ，您还可以查看 python-requests 然后尝试以下方法：

import urllib2
resp = urllib2.urlopen("The URL of the webpage whose source code you want")

现在浏览https://www.crummy.com/software/BeautifulSoup/bs4/doc/，这是BeautifulSoup，您可以使用它进行解析。您可以使用它设置要检索的行的条件。

Answer 3

好吧，您可以保存网页的HTML内容like this和使用文件的功能转到行：

    file_awesome = open('saved_html.html', 'r')
    content = file_awesome.readlines()
    print(content[7])

Answer 4

这应该这样做

import requests

def source_code(url, line):
    # get the page source code and split each line 
    lines = requests.get(url).text.split('\n')

    # page source code had too few lines
    if len(lines) < line : return ''
    else: return lines[line-1]


print(source_code('somepageurl', 9))

如何在Python中获取一系列网站源代码？

4 个答案: