Question

我试图抓取商店位置的文本详细信息，并使用BeautifulSoup将它们写入csv。阿拉巴马州的2家商店位于LocationSecContent类，亚利桑那州的17家商店位于另一类LocationSecContent。在乔治亚州，第一家商店机场位于类别LocationSecContent内的单一类别中，其余4位于位于LocationSecContent内的另一个类位置。我想抓取文本详细信息，并将商店详细信息（如姓名，位置，街道，电话，传真，小时内容和所有详细信息）写入csv文件。我在firefox中使用firebug。对不起，如果有任何错误，我是beautifulsoup的初学者。

这是我尝试过的：

from bs4 import BeautifulSoup
import requests

page = requests.get('http://freshvites.com/store-locator/')

soup = BeautifulSoup(page.text, 'html.parser')
d={}
for table in soup.find_all("div", {"class":"content freshvites-location"}):
    table
for col in table.find_all("td"):

    LocationSecHdr=col.find_all("div",{'class':'LocationSecHdr'})
    Location=col.find_all("div",{'class':'location'})


dt="LocationSecHdr:%s,Location: %s" %(LocationSecHdr, Location)
zx=BeautifulSoup(dt, "html.parser")

print zx.get_text()

我无法遍历行并刮掉文本。

方法2：

from bs4 import BeautifulSoup

import requests


page = requests.get('http://freshvites.com/store-locator/')
#print page


soup = BeautifulSoup(page.text, 'html.parser')
#print soup.find_all('a')

for table in soup.find_all("div",{'class':'content freshvites-location'}):
    table


LocationSecHdr=''
LocationSecContent=''
Location=''
LocationTitle=''
Phone=''
Fax=''
HoursTitle=''
HoursContent=''


for col in table.find_all("td"):      
    LocationSecHdr=col.find_all("div",{'class':'LocationSecHdr'})
    #LocationSecContent= col.find_all("div",{'class':'LocactionSecContent'})
    #Location= col.find_all("div",{'class':'location'})
    LocationTitle= col.find_all("div",{'class':'locationTitle'})
    Phone= col.find_all("div",{'class':'Phone'})
    Fax= col.find_all("div",{'class':'Fax'})
    HoursContent=col.find_all("div",{'class':'HoursContent'})

    data="LocationSecHdr: %s, LocationSecContent: %s, Location:%s, LocationTitle : %s, Phone:%s, Fax :%s, HoursContent:%s " %(LocationSecHdr, LocationSecContent, Location, LocationTitle, Phone, Fax, HoursContent)
    zax=BeautifulSoup(data,"html.parser")

print zax.get_text()

如果我尝试使用此代码，我无法获取商店的地址，我也不知道如何将这些细节作为字典获取

Answer 1

I think I have enough information now to answer your question.

You are looking for the wrong tag/class combination. All informations for a location are contained inside of a <div class="location">. Here is a sample:

<div class="location">
<div class="locationTitle">32nd Street &amp; Thunderbird</div>
Fresh Vitamins<br> 
13802 N. 32nd St #11<br> 
Phoenix, AZ 85032<br>
<div class="Phone">&nbsp;</div>
<div class="Fax">877.935.6902</div>
<div class="HoursTitle">Hours:</div>
<div class="HoursContent">9am - 7pm M-F<br> 9am - 6pm Sat<br> 11am - 4pm Sun</div>
</div>

As you can see in the sample there is no <tr> or <td> so looking for that doesn't really make sense.

Here's a short python script I put together to find all locations:

from bs4 import BeautifulSoup
import requests

page = requests.get('http://freshvites.com/store-locator/')

soup = BeautifulSoup(page.content, 'html.parser')

for div in soup.find_all("div", {"class":"location"}):
    print(div)

Now you just need to filter the information you need from div. Everything you need for that should be easy to find on so.

使用beautifulsoup

1 个答案: