我希望以下命令从此范围内的地址中获取日期,但我似乎无法让它运行多次。我正在使用Python 3.正如您在下面看到的那样,该网站的网址附加了i以便阅读http://zinc.docking.org/substance/10; http://zinc.docking.org/substance/11 ......等等。这是代码:
import bs4 as bs
import urllib.request
site = "http://zinc.docking.org/substance/"
for i in range(10, 16):
site1 = str("%s%i" % (site, i))
sauce = urllib.request.urlopen(site1).read()
soup = bs.BeautifulSoup(sauce, 'lxml')
table1 = soup.find("table", attrs={"class": "substance-properties"})
for row in table1.findAll('tr'):
row1 = row.findAll('td')
ate = row1[0].getText()
print(ate)
这是我的输出:
$python3 Date.py
November 11th, 2005
然而,脚本应该给我3个日期。这段代码工作,所以我知道row [0]实际上包含一个值。我觉得有一些简单的格式错误,但我不知道从哪里开始故障排除。当我格式化它时#34;正确"这是代码:
import bs4 as bs
import urllib.request
import pandas as pd
import csv
site = "http://zinc.docking.org/substance/"
for i in range(10, 16):
site1 = str("%s%i" % (site, i))
sauce = urllib.request.urlopen(site1).read()
soup = bs.BeautifulSoup(sauce, 'lxml')
table1 = soup.find("table", attrs={"class": "substance-properties"})
table2 = soup.find("table", attrs={"class": "protomers"})
for row in table1.findAll('tr'):
row1 = row.findAll('td')
ate = row1[0].getText()
print(ate)
我得到的错误如下:
Traceback (most recent call last):
File "Stack.py", line 11, in <module>
ate = row1[1].getText()
IndexError: list index out of range
第一个代码有效,所以我知道row [0]确实包含一个值。有什么想法吗?
答案 0 :(得分:1)
您可能想要修复缩进:
import bs4 as bs
import urllib.request
site = "http://zinc.docking.org/substance/"
for i in range(10, 16):
site1 = str("%s%i" % (site, i))
sauce = urllib.request.urlopen(site1).read()
soup = bs.BeautifulSoup(sauce, 'lxml')
table1 = soup.find("table", attrs={"class": "substance-properties"})
for row in table1.findAll('tr'):
row1 = row.findAll('td')
Date = row1[0].getText()
print(Date)
编辑:您应该重命名Date
变量,即保留名称。此外,按照惯例,Python变量是小写的。