Question

我在搜索字母'a'后使用硒来获取“动态内容”，然后将该表保存到json文件中。

我尝试直接使用json.loads(html)无效，然后在html上尝试了encode('utf-8').decode('ascii','ignore')但也无效

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import json
driver=webdriver.Chrome(executable_path="chromedriver")
driver.get("http://example.webscraping.com/places/default/index")
driver.find_element_by_xpath('/html/body/div[1]/div[2]/div/div/ul/li[2]/a').click()
elem=driver.find_element_by_xpath('//*[@id="search_term"]')
elem.send_keys("a")
elem.send_keys(Keys.RETURN)
html=driver.page_source.encode('utf-8').decode('ascii','ignore')
driver.close()
print json.loads(html)

这就是我想要的输出，因此我可以将其保存到文本文件中。

{"records": [{"pretty_link": "<div><a href=\"/places/default/view/Afghanistan-1\"><img src=\"/places/static/images/flags/af.png\" /> Afghanistan</a></div>", "country": "Afghanistan", "id": 3506077}, {"pretty_link": "<div><a href=\"/places/default/view/Aland-Islands-2\"...

这是我收到的raise ValueError("No JSON object could be decoded") ValueError: No JSON object could be decoded

错误

Answer 1

无法检测到 JSON ，因为请求的页面（和driver.page_source）将返回 HTML ，而您需要请求 JSON 。尝试使用此代码以获取所需的输出

import requests

print requests.get('http://example.webscraping.com/places/ajax/search.json?&search_term=a&page_size=10&page=0').json()

如果仅需要records：

response = requests.get('http://example.webscraping.com/places/ajax/search.json?&search_term=a&page_size=10&page=0').json()
print response['records']

要获取国家名称：

for item in response['records']:
    print item['country']

输出：

'Afghanistan'
'Aland Islands'
'Albania'
'Algeria'
'American Samoa'
'Andorra'
'Angola'
'Anguilla'
'Antarctica'
'Antigua and Barbuda'

从硒打开后无法从页面源检测到json

1 个答案: