我试图编写一个从excel电子表格中提取属性ID的程序,导航到基于这些ID的网页,其中网页刮擦"网页刮擦"关联的属性值并将它们导回到同一个电子表格中。我提前道歉我是一个非常新手的python(或任何语言tbh)编码器。这是迄今为止的代码:
import xlrd
from lxml import html
import requests
class Estimate:
def importo(self):
# access excel spreadsheet
file_location = "S:\Powerdel\Transmission Engineering\Miscellaneous\Estimates\Auto_Estimator\Estimate_Output.xls"
workbook = xlrd.open_workbook(file_location)
sheet = workbook.sheet_by_index(0)
# import number of columns from spreadsheet
n = int(sheet.nrows)
#initalize lists
id = [0] * (n - 1)
width = [0] * (n - 1)
cost = [0] * (n - 1)
size = [0] * (n - 1)
# import values from spreadsheet
for row in range(n-1):
id[row] = sheet.cell_value(row+1,3)
width[row] = sheet.cell_value(row+1,1)
#grab cost from webpage
#for row in range (n-1):
name = "http://propaccess.traviscad.org/clientdb/Property.aspx?prop_id={0}" .format(id[0])
page = requests.get(name)
tree = html.fromstring(page.text)
cost[0] = tree.xpath('//div[@id="landDetails"]/table/tbody/tr[2]/td[5]/text()')
print(id[0])
print(width[4])
print(n)
print(cost[0])
print(name)
print(tree.text_content().encode('utf-8'))
Estimate().importo()"
结果:
337776
492.0
63
[]
http://propaccess.traviscad.org/clientdb/Property.aspx?prop_id=337776
Travis Property Search
body { text-align: center; padding: 150px; }
h1 { font-size: 50px; }
body { font: 20px Helvetica, sans-serif; color: #333; }
#article { display: block; text-align: left; width: 650px; margin: 0 auto; }
a { color: #dc8100; text-decoration: none; }
a:hover { color: #333; text-decoration: none; }
Please try again
Sorry for the inconvenience but your session has either timed out or the server is busy handling other requests. You may visit us on the the following website for information, otherwise please retry your search again shortly:Travis Central Appraisal District Website
Click here to reload the property search to try again
我的问题(目前)是我的request.get从目标网站重定向。有趣的是,如果我按照链接将我的程序打印出来后运行它,我会被重定向到相同的道歉。 Buuut,如果我通过traviscad.org网站上的菜单项导航到预期的网页,并按照我的打印链接,繁荣,正确的网站。
就像我说的,我是全新的,所以我不知道为什么我会被重定向或者如何防止它。如果您有任何建议,请告诉我!
答案 0 :(得分:0)
使用mechanize(https://pypi.python.org/pypi/mechanize/)来模拟实际的浏览器请求。然后,您可以遍历页面,就像您使用会话详细信息等直接浏览页面一样。