我正在抓取这个维基百科页面:
https://en.wikipedia.org/wiki/List_of_shopping_malls_in_the_South_Florida_metropolitan_area
从表中获取数据,如下所示:
Location = response.xpath('//*[@id="mw-content-text"]/table/tr/td[2]/a/text()').extract()[0]
Name = response.xpath('//*[@id="mw-content-text"]/table/tr/td[1]/a/text()').extract()
一旦我拥有它,计划是将这些列表添加到数据框中。我得到的问题是:
len(Name)
40
和
len(Location)
47
这是因为在位置列的某些行中有几个元素,就像第三列中的那样:椰林,迈阿密 在那里,我得到了元素。
答案 0 :(得分:2)
您可以使用read_html
,而df
是[{1}}的{{1}}的首位:
df
答案 1 :(得分:1)
您只需要正确的xpath:
rows = response.xpath('//table[@class="wikitable"]//tr[not(./th)]')
for row in rows:
print ''.join(row.xpath('.//td[1]//text()').extract()), ' | ' , ''.join(row.xpath('.//td[2]//text()').extract())
Aventura Mall | Aventura
Bal Harbour Shops | Bal Harbour
Bayside Marketplace | Downtown Miami
Boynton Beach Mall | Boynton Beach
CityPlace | West Palm Beach
CocoWalk | Coconut Grove, Miami
Coral Square | Coral Springs
Dadeland Mall | Kendall
Dolphin Mall | Sweetwater
Downtown at the Gardens | Palm Beach Gardens
The Falls | Kendall
Galeria International Mall | Downtown Miami
The Galleria at Fort Lauderdale | Fort Lauderdale
The Gardens Mall | Palm Beach Gardens
The Grand Doubletree Shops | Downtown Miami
Las Olas Riverfront | Fort Lauderdale
Las Olas Shops | Fort Lauderdale
Lincoln Road Mall | Miami Beach
Loehmann's Fashion Island | Aventura
Mall of the Americas | Miami
The Mall at 163rd Street | North Miami Beach
The Mall at Wellington Green | Wellington
Miami International Mall | Doral
Miracle Marketplace | Miami
Metrofare Shops & Cafe | Government Center, Downtown Miami
Pembroke Lakes Mall | Pembroke Pines
Pompano Citi Centre | Pompano Beach
Sawgrass Mills | Sunrise
Seminole Paradise | Hollywood
The Shops at Fontainebleau | Miami Beach
The Shops at Mary Brickell Village | Brickell, Miami
The Shops at Midtown Miami | Midtown Miami
The Shops at Pembroke Gardens | Pembroke Pines
The Shops at Sunset Place | South Miami
Southland Mall | Cutler Bay
Town Center at Boca Raton | Boca Raton
The Village at Gulfstream Park | Hallandale Beach
Village of Merrick Park | Coral Gables
Westfield Broward | Plantation
Westland Mall | Hialeah
答案 2 :(得分:0)
如果您想要的是将两个单词视为一个单词,则可以对整个单词执行字符串替换,以使用空字符串替换逗号:
location = [loc.replace(',', '') for loc in location]