我有一个.txt文件(名为test_1.txt),其格式如下:
https://maps.googleapis.com/maps/api/directions/xml?origin=Bethesda,MD&destination=Washington,DC&sensor=false&mode=walking
https://maps.googleapis.com/maps/api/directions/xml?origin=Miami,FL&destination=Mobile,AL&sensor=false&mode=walking
https://maps.googleapis.com/maps/api/directions/xml?origin=Chicago,IL&destination=Scranton,PA&sensor=false&mode=walking
https://maps.googleapis.com/maps/api/directions/xml?origin=Baltimore,MD&destination=Charlotte,NC&sensor=false&mode=walking
如果您转到上面的其中一个链接,您将看到XML格式的输出。通过下面的代码,我设法让它迭代到第二个方向请求(迈阿密到移动),它打印看似随机的数据,这不是我想要的。我也能够正常工作,只需使用.txt一次访问一个URL,但直接从代码中打印出我需要的数据。是否有任何理由只会进入第二个URL并打印错误的信息? Python代码如下:
import urllib2
from bs4 import BeautifulSoup
with open('test_1.txt', 'r') as f:
f.readline()
mapcalc = f.readline()
response = urllib2.urlopen(mapcalc)
soup = BeautifulSoup(response)
for leg in soup.select('route > leg'):
duration = leg.duration.text.strip()
distance = leg.distance.text.strip()
start = leg.start_address.text.strip()
end = leg.end_address.text.strip()
print duration
print distance
print start
print end
编辑:
这是Shell中Python代码的输出:
56
1 min
77
253 ft
Miami, FL, USA
Mobile, AL, USA
答案 0 :(得分:1)
这里有一个link,可以更清楚地了解打开文件和阅读行等时可以获得的行为(与Lev Levitsky的评论有关)。
一种方式:
import httplib2
from bs4 import BeautifulSoup
http = httplib2.Http()
with open('test_1.txt', 'r') as f:
for mapcalc in f:
status, response = http.request(mapcalc)
for leg in BeautifulSoup(response):
duration = leg.duration.text.strip()
distance = leg.distance.text.strip()
start = leg.start_address.text.strip()
end = leg.end_address.text.strip()
print duration
print distance
print start
print end
f.close()
我对这类事情不熟悉,但我得到了上面的代码来处理以下输出:
4877
1 hour 21 mins
6582
4.1 mi
Bethesda, MD, USA
Washington, DC, USA
56
1 min
77
253 ft
Miami, FL, USA
Mobile, AL, USA
190
3 mins
269
0.2 mi
Chicago, IL, USA
Scranton, PA, USA
12
1 min
15
49 ft
Baltimore, MD, USA
Charlotte, NC, USA