Python请求获取的数据与我在浏览器上看到的数据不同

时间:2016-11-05 08:47:16

标签: python curl python-requests

所以我试图访问zillow URL。通过浏览器访问时,它与我通过代码看到的不同。详情如下。

CURL

curl 'http://www.zillow.com/homes/KY_rb/' -H 'Host: www.zillow.com' -H 'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:49.0) Gecko/20100101 Firefox/49.0' -H 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8' -H 'Accept-Language: en-US,en;q=0.5' --compressed -H 'Referer: http://www.zillow.com/homes/fsbo/featured_sort/47.368594,-68.686523,28.110749,-124.936523_rect/3_zm/' -H 'Cookie: JSESSIONID=D9BF4E280B16431893C3A11A8FC3F825; abtest=3|DO8RElLJuj2felZqqw; zguid=23|%24b42a26dc-8387-4086-b000-cc49ddfbc450; search=6|1480915840720%7Crect%3D47.368594%252C-68.686523%252C28.110749%252C-124.936523%26zm%3D3%26disp%3Dmap%26mdm%3Dauto%26p%3D1%26sort%3Dfeatured%26z%3D1%26lt%3Dfsbo%26fs%3D1%26fr%3D0%26mmm%3D1%26rs%3D0%26ah%3D0%26singlestory%3D0%09%01%09%09%09%092%090%09US_%09; F5P=3005270026.0.0000; _ga=GA1.2.1136269898.1478324471; _gat=1; __gads=ID=3f2f3e2d6e19b149:T=1478323799:S=ALNI_Mava6ZGjT_MrRhAVG7ndewcDCN60A; ipe_s=fbc57b01-3937-f803-5da1-5c4887cc949d; _bizo_bzid=aa621351-3627-408d-8838-440c1bd3f163; _bizo_cksm=EE838E07FF3AF15E; ipe.29115.pageViewedCount=1; _bizo_np_stats=14%3D1028%2C' -H 'Connection: keep-alive' -H 'Upgrade-Insecure-Requests: 1'

卷曲给出了正确的结果。

Fetch.py​​

import requests
from bs4 import BeautifulSoup
from time import sleep
import xmltodict

state = 'KY'
url = 'http://www.zillow.com/homes/' + state + '_rb/'
property_urls = []
headers = {
    'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.143 Safari/537.36',
    'upgrade-insecure-requests': 1,
    'accept-language': 'en-US,en;q=0.8',
    'Connection': 'keep-alive',
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8'
}

try:
    session = requests.session()
    r = session.get(url, headers=headers, timeout=5)
    sleep(2)
    html = html = r.text
    soup = BeautifulSoup(html, 'lxml')
    print(html)
except requests.ConnectionError as e:

    print("OOPS!! Connection Error. Make sure you are connected to Internet. Technical Details given below.\n")

    print(str(e))

except requests.Timeout as e:

    print("OOPS!! Timeout Error")

    print(str(e))

except requests.RequestException as e:

    print("OOPS!! General Error")

    print(str(e))

except KeyboardInterrupt:

    print("Someone closed the program")

finally:
    print("Total Properties = " + str(len(property_urls)))
    try:
        # file to store state based URLs
        state_file = open(state + '_file.txt', 'a+')
        state_file.write("\n".join(property_urls))
        state_file.close()
    except Exception as ex:
        print("Unable to store records in CSV file. Techncical details below.\n")
        print(str(e))

1 个答案:

答案 0 :(得分:0)

不确定different data的意思(可以表示任何意思,略有不同,完全不同等)。您的curl正在使用--compressed,实际上意味着请求标头Accept-Encoding: deflate, gzip。尝试从python代码中添加该标头。