使用beautifulsoup和输出错误进行JSONdata解析

时间:2019-06-09 23:25:44

标签: python json python-requests

当我运行以下代码时,会产生以下错误:

import requests
import json
from bs4 import BeautifulSoup

JSONDATA = requests.request("GET", "https://thisiscriminal.com/wp-json/criminal/v1/episodes?posts=1000000&page=1")
JSONDATA = JSONDATA.json()

for line in JSONDATA['posts']:
    soup = BeautifulSoup(line['episodeNumber'])
    soup = BeautifulSoup(line['title'])
    soup = BeautifulSoup(line['audioSource'])
    soup = BeautifulSoup(line['large'])
    soup = BeautifulSoup(line['long'])
    print soup.prettify()

会产生以下错误(我对LXML的建议尝试了各种变体):

  • LXML问题
  • 有关不喜欢.mp3链接的问题,但这不是问题,因为此链接正确吗?
  • 查找“大”缩略图时遇到问题,但是将标题,audioSource等用于等效字段不会产生相同的错误,但是查看网站数据是否在正确的框中?

输出错误

python ./test2.py
./test2.py:14: UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("lxml"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently.

The code that caused this warning is on line 14 of the file ./test2.py. To get rid of this warning, pass the additional argument 'features="lxml"' to the BeautifulSoup constructor.

  soup = BeautifulSoup("features=lxml")(line['episodeNumber'])
./test2.py:16: UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("lxml"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently.

The code that caused this warning is on line 16 of the file     ./test2.py. To get rid of this warning, pass the additional argument 'features="lxml"' to the BeautifulSoup constructor.

  soup = BeautifulSoup(line['title'])
./test2.py:18: UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("lxml"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently.

The code that caused this warning is on line 18 of the file ./test2.py. To get rid of this warning, pass the additional argument 'features="lxml"' to the BeautifulSoup constructor.

  soup = BeautifulSoup(line['audioSource'])

/home/leo/.local/lib/python2.7/site-packages/bs4/ init .py:335:
    用户警告:
 “ https://dts.podtrac.com/redirect.mp3/dovetail.prxu.org/criminal/85cd4e4d-fa8b-4df2-8a8c-78ad0e800574/Episode_116_190504_audition_mix_neg18_part_1.mp3”看起来像一个URL。 Beautiful Soup不是HTTP客户端。您可能应该使用HTTP客户端(例如请求)将文档获取到URL的后面,并将该文档提供给Beautiful Soup。   那个文件给《美丽的汤》。 %解码_标记     追溯(最近一次通话):       文件“ ./test2.py”,第20行,在         汤= BeautifulSoup(line ['large'])     KeyError:“大”

1 个答案:

答案 0 :(得分:1)

如果您尝试仅获取json中的数据,那么它将起作用。

import pandas as pd

import requests
import json
from bs4 import BeautifulSoup

JSONDATA = requests.request("GET", "https://thisiscriminal.com/wp-json/criminal/v1/episodes?posts=1000000&page=1")
JSONDATA = JSONDATA.json()

#loads the Json in a dataframe
df = pd.io.json.json_normalize(JSONDATA['posts'])
df.to_csv('posts.csv')

lxml问题将通过以下方法解决:     BeautifulSoup(line ['episodeNumber'],'lxml') 这是因为BeautifulSoup需要html解析器来制作汤对象。 如果没有lxml的话。

pip install lxml

第二个警告是关于您传递一个URL来创建汤对象的操作,该对象不起作用,因为如警告所述,它不知道如何下载链接。

最后,您的最后一个错误是由于链接json没有名为“ large”的键

在那里您将需要一个异常块。