我无法获取下面的county
列表来填充我的循环结果。当我打印出每次迭代的结果以及列表中项目的索引时,我看到每次都得到一个0的索引,表明在每次循环后数据不会在列表中持久存在。因此,当我在循环完成后尝试索引county
循环时,当然根本没有数据,所以我得到'列表索引超出范围错误。'
我已经研究了“列表索引超出范围”错误我不断得到,我明白我得到它因为county
列表为空,但为什么它是空的?
构成target_divs
列表中一个条目的HTML源代码如下所示:
<div class="school-type-list-text">
<div class="table_cell_county"><a href='/alabama/autauga-county'>Autauga County</a></div>
<div class="change_div"></div>
<div class="table_cell_other">7<span> Schools</span></div>
<div class="table_cell_other">1,587<span> Students</span></div>
<div class="table_cell_other">8%<span> Minority</span></div>
<div class="break"></div>
这是我的剧本:
import urllib2
from bs4 import BeautifulSoup
import pandas
import csv
page1 = 'https://www.privateschoolreview.com/alabama'
alabama = urllib2.urlopen(page1)
soup = BeautifulSoup(alabama, "lxml")
target_divs = soup.find_all("div", class_= "school-type-list-text")
for i in target_divs:
county = i.find_all("div", class_= "table_cell_county")
for i in county:
print i.text
print county.index(i)
print county
print county[0]
在@ Software2建议更改循环光标后更新,但我仍然遇到同样的错误:
import urllib2
from bs4 import BeautifulSoup
import pandas
import csv
page1 = 'https://www.privateschoolreview.com/alabama'
alabama = urllib2.urlopen(page1)
soup = BeautifulSoup(alabama, "lxml")
target_divs = soup.find_all("div", class_= "school-type-list-text")
for div in target_divs:
counties = div.find_all("div", class_= "table_cell_county")
for county in counties:
print county.text
print counties.index(county)
print counties
答案 0 :(得分:0)
我可能会错,你可以尝试这个吗。看来你在嵌套循环中使用相同的i
for i in target_divs:
county = i.find_all("div", class_= "table_cell_county")
for j in county:
print j.text
print county.index(j)
答案 1 :(得分:0)
您在嵌套循环中使用相同的变量i
作为两个不同的东西。所以第一个被覆盖了。更改第二个变量名称。
理想情况下,像i
这样的变量名称不是很具描述性,因此很容易犯这样的错误。尝试类似:
for div in target_divs:
counties = div.find_all("div", class_= "table_cell_county")
for county in counties:
print county.text
print counties.index(county)
答案 2 :(得分:0)
我假设您需要counties
中的县列表。在我看来,问题是div.find_all()
的返回值,它返回一个最多一个县的数组。要填充县,请尝试以下方法:
counties = []
for div in target_divs:
county = div.find_all('div', class_= 'table_cell_county')
for c in county:
counties.append(c.text.encode('utf-8'))
print counties # Returns: ['Autauga County', 'Baldwin County', 'Barbour County', 'Bibb County', 'Blount County', 'Bullock County', 'Butler County', 'Calhoun County', 'Chambers County', 'Chilton County', 'Choctaw County', 'Clarke County', 'Clay County', 'Coffee County', 'Colbert County', 'Conecuh County', 'Covington County', 'Crenshaw County', 'Cullman County', 'Dale County', 'Dallas County', 'Dekalb County', 'Elmore County', 'Escambia County', 'Etowah County', 'Greene County', 'Hale County', 'Henry County', 'Houston County', 'Jackson County', 'Jefferson County', 'Lauderdale County', 'Lee County', 'Limestone County', 'Lowndes County', 'Macon County', 'Madison County', 'Marengo County', 'Marion County', 'Marshall County', 'Mobile County', 'Monroe County', 'Montgomery County', 'Morgan County', 'Perry County', 'Pickens County', 'Pike County', 'Randolph County', 'Russell County', 'Saint Clair County', 'Shelby County', 'Sumter County', 'Talladega County', 'Tallapoosa County', 'Tuscaloosa County', 'Walker County', 'Wilcox County', 'Winston County']
print counties[0] # Returns: 'Autauga County'