Question

我正在学习python。由于我是新手，我不允许使用Python的内置库，除非被告知。对于这个问题，我要找到“http://news.ycombinator.com/”的各个href位置。我在这里看了几个答案，我看到大多数人被告知使用beautifulsoup。我不能用它。我被告知使用列表然后附加这些列表来找到它。到目前为止，这是我的代码。很多原因很糟糕，其中最不重要的是我不知道自己在做什么。

import urllib.request
page_list=[]
local_filename, headers = urllib.request.urlretrieve("http://news.ycombinator.com/")
html=open(local_filename)
for line in html:
    page_list.append(line)
for k in range(len(page_list)):
    if '<a href' in page_list[k]:
        href_line=k
href_start = page_list[href_line].find('<a href')
href_end = page_list[href_line].find('</a')
print(page_list[href_start][href_start: href_end])

我知道我需要两个列表才能进行追加。但是，我不知道怎么做，因为我是Python的新手。我很乐意帮忙。

在Web代码中找到href html时遇到问题

0 个答案: