解析1track ...有关点的信息

时间:2019-01-31 07:26:25

标签: python selenium parsing beautifulsoup

我用硒和漂亮的肥皂。等待30秒钟,直到弹出窗口加载,关闭,然后再次读取信息。但是有关包裹所在点的信息,好像是隐藏的。建议我尝试使用usser代理,但他也没有帮助。

import requests
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.common.exceptions import TimeoutException
import time
import re
import requests

driver = 

webdriver.Chrome("C:\\Users\\Yres\\Downloads\\
chromedriver_win32\\chromedriver.exe")

driver.get('https://1track.ru/tracking/RU341961010HK')

time.sleep(30)

driver.find_element_by_class_name ("close").click()

driver.get('https://1track.ru/tracking/RU341961010HK')

source = driver.page_source
driver.quit() 

soup = BeautifulSoup(source, "html.parser")
print(soup)

1 个答案:

答案 0 :(得分:1)

您可以使用.post

以一种不错的json格式获取所有数据。
import requests
from pandas.io.json import json_normalize

url = 'https://1track.ru/ajax/tracking2'

headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36'}
payload = {'tracks[0][track]': 'RU341961010HK'}

data = requests.post(url, headers=headers, data=payload).json()

jsonObj = data['JSON']

# you can do whatever you'd like here. But I just threw it into a dataframe
df = json_normalize(jsonObj['data']['events']['data'])

输出:

具有以下列的数据框:

['attribute', 'attribute.name', 'attribute.name_orig', 'courier.country',
   'courier.country_code', 'courier.image', 'courier.name',
   'courier.track', 'courier.uri', 'date', 'date_format.F',
   'date_format.H', 'date_format.M', 'date_format.Y', 'date_format.d',
   'date_format.i', 'date_format.m', 'date_format.s', 'days', 'daysclass',
   'details', 'details.name', 'details.name_orig', 'kk', 'payment',
   'place', 'place.name', 'place.name_orig', 'placeto', 'status', 'time',
   'value', 'weight', 'zip', 'zip.code', 'zip.country.code',
   'zip.country.name', 'zip.fullcode', 'zip.location.address',
   'zip.location.lat', 'zip.location.long', 'zip.name', 'zip.type']

和数据框(仅显示前三行:

print(df.head(3).to_string())
  attribute                           attribute.name                      attribute.name_orig courier.country courier.country_code     courier.image  courier.name  courier.track   courier.uri        date date_format.F date_format.H date_format.M date_format.Y date_format.d date_format.i date_format.m date_format.s    days daysclass details                 details.name            details.name_orig  kk payment  place      place.name place.name_orig  placeto   status      time value  weight  zip zip.code zip.country.code zip.country.name zip.fullcode  zip.location.address  zip.location.lat  zip.location.long        zip.name zip.type
0       NaN                  Передача в авиакомпании                     Hand over to airline           Китай                   CN       cainiao.png       CAINIAO  RU341961010HK       cainiao  29.01.2019       January            14           Jan          2019            29            59            01            00  4 день   success   False                          NaN                          NaN   0    None  False             NaN             NaN    False  transit  14:59:00  None       0  NaN      NaN              NaN              NaN          NaN                   NaN               NaN                NaN             NaN      NaN
1       NaN  Airline departure from original country  Airline departure from original country           Китай                   CN       cainiao.png       CAINIAO  RU341961010HK       cainiao  29.01.2019       January            13           Jan          2019            29            34            01            00  4 день   success   False                          NaN                          NaN   1    None  False             NaN             NaN    False  transit  13:34:00  None       0  NaN      NaN              NaN              NaN          NaN                   NaN               NaN                NaN             NaN      NaN
2     False                                      NaN                                      NaN          Россия                   RU  russian-post.png  Почта России  RU341961010HK  russian-post  29.01.2019       January            10           Jan          2019            29            46            01            00  4 день   success     NaN  Экспорт международной почты  Экспорт международной почты   2    None    NaN  Гонконг HKHKGA  Гонконг HKHKGA    False  transit  10:46:00  None       0  NaN   HKHKGA               HK        Hong Kong    HK_HKHKGA                   NaN               0.0                0.0  Гонконг HKHKGA      int

该过程获取数据:

data = requests.post(url, headers=headers, data=payload).json()

enter image description here

然后我只存储其中的JSON:

jsonObj = data['JSON']

enter image description here

然后,我可以“解压缩”所需的数据,并存储为数据框:

df = json_normalize(jsonObj['data']['events']['data'])

enter image description here

这是数据框:

enter image description here