我是刮痧的新手。我想在网站上解析一些图片,我需要网站上的标题,网址和图片(图库)。但我不断收到错误期望用双引号括起来的属性名称:行1第2列(字符1)
load = json.loads(data)
我想这是关于json格式化的,所以我用Google搜索了它。我看到的所有json都像“数据”:(没有反斜杠)我真的不知道json。谁知道如何处理这个问题??
提前致谢:))
在Beautifulsoup之后,我必须将其转换为utf-8代码才能使用正则表达式。
def get_pic(html, url):
soup = bs(html, 'lxml')
title = soup.find_all('title')
if title is not None and len(title) > 0:
print(title[0].get_text())
else:
print('error finding title')
decode = html.decode('utf-8')
gallery = re.compile('gallery: JSON\.parse\("(.+?)\),', re.S)
result = re.search(gallery, decode)
if result:
data = result.group(1)
load = json.loads(data)
if load and 'sub_images' in load.keys():
sub_images = load.get('sub_images')
for item in sub_images:
image = [item.get(url)]
return{
'title': title,
'url': url,
'image': image
}
这是在网页浏览器上打开的网站上的图库代码:
gallery: JSON.parse("{\"count\":8,\"sub_images\":[{\"url\":\"http:\\/\\/p1.pstatp.com\\/origin\\/2a3e0000220732136457\",\"width\":1920,\"url_list\":[{\"url\":\"http:\\/\\/p1.pstatp.com\\/origin\\/2a3e0000220732136457\"},{\"url\":\"http:\\/\\/pb3.pstatp.com\\/origin\\/2a3e0000220732136457\"},{\"url\":\"http:\\/\\/pb9.pstatp.com\\/origin\\/2a3e0000220732136457\"}],\"uri\":\"origin\\/2a3e0000220732136457\",\"height\":1080},{\"url\":\"http:\\/\\/p3.pstatp.com\\/origin\\/2a380004fdda35e40777\",\"width\":1920,\"url_list\":[{\"url\":\"http:\\/\\/p3.pstatp.com\\/origin\\/2a380004fdda35e40777\"},{\"url\":\"http:\\/\\/pb9.pstatp.com\\/origin\\/2a380004fdda35e40777\"},{\"url\":\"http:\\/\\/pb1.pstatp.com\\/origin\\/2a380004fdda35e40777\"}],\"uri\":\"origin\\/2a380004fdda35e40777\",\"height\":1080},{\"url\":\"http:\\/\\/p1.pstatp.com\\/origin\\/2a380004fdcd98e1fd47\",\"width\":1920,\"url_list\":[{\"url\":\"http:\\/\\/p1.pstatp.com\\/origin\\/2a380004fdcd98e1fd47\"},{\"url\":\"http:\\/\\/pb3.pstatp.com\\/origin\\/2a380004fdcd98e1fd47\"},{\"url\":\"http:\\/\\/pb9.pstatp.com\\/origin\\/2a380004fdcd98e1fd47\"}],\"uri\":\"origin\\/2a380004fdcd98e1fd47\",\"height\":1080},{\"url\":\"http:\\/\\/p1.pstatp.com\\/origin\\/2a3d0001eeb8c2028db5\",\"width\":1920,\"url_list\":[{\"url\":\"http:\\/\\/p1.pstatp.com\\/origin\\/2a3d0001eeb8c2028db5\"},{\"url\":\"http:\\/\\/pb3.pstatp.com\\/origin\\/2a3d0001eeb8c2028db5\"},{\"url\":\"http:\\/\\/pb9.pstatp.com\\/origin\\/2a3d0001eeb8c2028db5\"}],\"uri\":\"origin\\/2a3d0001eeb8c2028db5\",\"height\":1080},{\"url\":\"http:\\/\\/p9.pstatp.com\\/origin\\/2a3e000021fedb5f2ed1\",\"width\":1920,\"url_list\":[{\"url\":\"http:\\/\\/p9.pstatp.com\\/origin\\/2a3e000021fedb5f2ed1\"},{\"url\":\"http:\\/\\/pb1.pstatp.com\\/origin\\/2a3e000021fedb5f2ed1\"},{\"url\":\"http:\\/\\/pb3.pstatp.com\\/origin\\/2a3e000021fedb5f2ed1\"}],\"uri\":\"origin\\/2a3e000021fedb5f2ed1\",\"height\":1080},{\"url\":\"http:\\/\\/p1.pstatp.com\\/origin\\/2a380004fddbc0401cc2\",\"width\":1920,\"url_list\":[{\"url\":\"http:\\/\\/p1.pstatp.com\\/origin\\/2a380004fddbc0401cc2\"},{\"url\":\"http:\\/\\/pb3.pstatp.com\\/origin\\/2a380004fddbc0401cc2\"},{\"url\":\"http:\\/\\/pb9.pstatp.com\\/origin\\/2a380004fddbc0401cc2\"}],\"uri\":\"origin\\/2a380004fddbc0401cc2\",\"height\":1080},{\"url\":\"http:\\/\\/p1.pstatp.com\\/origin\\/2a3d0001eec594a6d70d\",\"width\":1920,\"url_list\":[{\"url\":\"http:\\/\\/p1.pstatp.com\\/origin\\/2a3d0001eec594a6d70d\"},{\"url\":\"http:\\/\\/pb3.pstatp.com\\/origin\\/2a3d0001eec594a6d70d\"},{\"url\":\"http:\\/\\/pb9.pstatp.com\\/origin\\/2a3d0001eec594a6d70d\"}],\"uri\":\"origin\\/2a3d0001eec594a6d70d\",\"height\":1080},{\"url\":\"http:\\/\\/p3.pstatp.com\\/origin\\/2a3d0001eeb7d396b175\",\"width\":1920,\"url_list\":[{\"url\":\"http:\\/\\/p3.pstatp.com\\/origin\\/2a3d0001eeb7d396b175\"},{\"url\":\"http:\\/\\/pb9.pstatp.com\\/origin\\/2a3d0001eeb7d396b175\"},{\"url\":\"http:\\/\\/pb1.pstatp.com\\/origin\\/2a3d0001eeb7d396b175\"}],\"uri\":\"origin\\/2a3d0001eeb7d396b175\",\"height\":1080}],\"max_img_width\":1920,\"labels\":[\"\\u72d7\",\"\\u5ba0\\u7269\"],\"sub_abstracts\":[\" \",\" \",\" \",\" \",\" \",\" \",\" \",\" \"],\"sub_titles\":[\"\\u53ef\\u7231\\u5446\\u840c\\u72d7\\u72d7\\u552f\\u7f8e\\u5199\\u771f\\u56fe\\u7247\\u58c1\\u7eb8\\uff01\",\"\\u53ef\\u7231\\u5446\\u840c\\u72d7\\u72d7\\u552f\\u7f8e\\u5199\\u771f\\u56fe\\u7247\\u58c1\\u7eb8\\uff01\",\"\\u53ef\\u7231\\u5446\\u840c\\u72d7\\u72d7\\u552f\\u7f8e\\u5199\\u771f\\u56fe\\u7247\\u58c1\\u7eb8\\uff01\",\"\\u53ef\\u7231\\u5446\\u840c\\u72d7\\u72d7\\u552f\\u7f8e\\u5199\\u771f\\u56fe\\u7247\\u58c1\\u7eb8\\uff01\",\"\\u53ef\\u7231\\u5446\\u840c\\u72d7\\u72d7\\u552f\\u7f8e\\u5199\\u771f\\u56fe\\u7247\\u58c1\\u7eb8\\uff01\",\"\\u53ef\\u7231\\u5446\\u840c\\u72d7\\u72d7\\u552f\\u7f8e\\u5199\\u771f\\u56fe\\u7247\\u58c1\\u7eb8\\uff01\",\"\\u53ef\\u7231\\u5446\\u840c\\u72d7\\u72d7\\u552f\\u7f8e\\u5199\\u771f\\u56fe\\u7247\\u58c1\\u7eb8\\uff01\",\"\\u53ef\\u7231\\u5446\\u840c\\u72d7\\u72d7\\u552f\\u7f8e\\u5199\\u771f\\u56fe\\u7247\\u58c1\\u7eb8\\uff01\"]}"),
这是我为画廊获得的json的一个例子:
{\"count\":6,\"sub_images\":[{\"url\":\"http:\\/\\/p3.pstatp.com\\/origin\\/66a600053c0d1fc8e138\",\"width\":640,\"url_list\":[{\"url\":\"http:\\/\\/p3.pstatp.com\\/origin\\/66a600053c0d1fc8e138\"},{\"url\":\"http:\\/\\/pb9.pstatp.com\\/origin\\/66a600053c0d1fc8e138\"},{\"url\":\"http:\\/\\/pb1.pstatp.com\\/origin\\/66a600053c0d1fc8e138\"}],\"uri\":\"origin\\/66a600053c0d1fc8e138\",\"height\":417},{\"url\":\"http:\\/\\/p1.pstatp.com\\/origin\\/66ab00014d177f404d15\",\"width\":640,\"url_list\":[{\"url\":\"http:\\/\\/p1.pstatp.com\\/origin\\/66ab00014d177f404d15\"},{\"url\":\"http:\\/\\/pb3.pstatp.com\\/origin\\/66ab00014d177f404d15\"},{\"url\":\"http:\\/\\/pb9.pstatp.com\\/origin\\/66ab00014d177f404d15\"}],\"uri\":\"origin\\/66ab00014d177f404d15\",\"height\":450},{\"url\":\"http:\\/\\/p3.pstatp.com\\/origin\\/66a90001dd4851c26006\",\"width\":640,\"url_list\":[{\"url\":\"http:\\/\\/p3.pstatp.com\\/origin\\/66a90001dd4851c26006\"},{\"url\":\"http:\\/\\/pb9.pstatp.com\\/origin\\/66a90001dd4851c26006\"},{\"url\":\"http:\\/\\/pb1.pstatp.com\\/origin\\/66a90001dd4851c26006\"}],\"uri\":\"origin\\/66a90001dd4851c26006\",\"height\":454},{\"url\":\"http:\\/\\/p3.pstatp.com\\/origin\\/66ab00014d190b580957\",\"width\":640,\"url_list\":[{\"url\":\"http:\\/\\/p3.pstatp.com\\/origin\\/66ab00014d190b580957\"},{\"url\":\"http:\\/\\/pb9.pstatp.com\\/origin\\/66ab00014d190b580957\"},{\"url\":\"http:\\/\\/pb1.pstatp.com\\/origin\\/66ab00014d190b580957\"}],\"uri\":\"origin\\/66ab00014d190b580957\",\"height\":448},{\"url\":\"http:\\/\\/p3.pstatp.com\\/origin\\/66a80001e8907e149341\",\"width\":640,\"url_list\":[{\"url\":\"http:\\/\\/p3.pstatp.com\\/origin\\/66a80001e8907e149341\"},{\"url\":\"http:\\/\\/pb9.pstatp.com\\/origin\\/66a80001e8907e149341\"},{\"url\":\"http:\\/\\/pb1.pstatp.com\\/origin\\/66a80001e8907e149341\"}],\"uri\":\"origin\\/66a80001e8907e149341\",\"height\":450},{\"url\":\"http:\\/\\/p3.pstatp.com\\/origin\\/66a600053c102403a97b\",\"width\":640,\"url_list\":[{\"url\":\"http:\\/\\/p3.pstatp.com\\/origin\\/66a600053c102403a97b\"},{\"url\":\"http:\\/\\/pb9.pstatp.com\\/origin\\/66a600053c102403a97b\"},{\"url\":\"http:\\/\\/pb1.pstatp.com\\/origin\\/66a600053c102403a97b\"}],\"uri\":\"origin\\/66a600053c102403a97b\",\"height\":466}],\"max_img_width\":640,\"labels\":[\"\\u52a8\\u7269\",\"\\u793e\\u4f1a\"],\"sub_abstracts\":[\"\\u51e0\\u4e2a\\u6e38\\u5ba2\\u5230\\u6237\\u5916\\u6e38\\u73a9\\u7684\\u65f6\\u5019\\u770b\\u5230\\u4e86\\u4e00\\u5934\\u4e0d\\u77e5\\u540d\\u52a8\\u7269\\uff0c\\u5b83\\u7684\\u5934\\u50cf\\u732a\\u5934\\u9f3b\\u5b50\\u5374\\u5f88\\u5c0f\\uff0c\\u51e0\\u4e2a\\u6e38\\u5ba2\\u7eb7\\u7eb7\\u8868\\u793a\\u6ca1\\u6709\\u89c1\\u8fc7\\u8fd9\\u79cd\\u52a8\\u7269\\u3002\",\"\\u8fd9\\u4e2a\\u52a8\\u7269\\u770b\\u5230\\u6e38\\u5ba2\\u540e\\u5e76\\u6ca1\\u6709\\u5bb3\\u6015\\uff0c\\u53cd\\u800c\\u4e3b\\u52a8\\u9760\\u8fd1\\u4ed6\\u4eec\\uff0c\\u6e38\\u5ba2\\u731c\\u60f3\\u5b83\\u5e94\\u8be5\\u662f\\u997f\\u4e86\\uff0c\\u4f46\\u662f\\u4e0d\\u77e5\\u9053\\u5b83\\u6709\\u6ca1\\u6709\\u653b\\u51fb\\u6027\\u8c01\\u4e5f\\u4e0d\\u6562\\u9760\\u8fd1\\u3002\",\"\\u6709\\u4e00\\u4e2a\\u6e38\\u5ba2\\u649e\\u7740\\u80c6\\u5b50\\u6162\\u6162\\u9760\\u8fd1\\uff0c\\u5e76\\u628a\\u624b\\u91cc\\u7684\\u997c\\u5e72\\u5582\\u7ed9\\u5b83\\u5403\\uff0c\\u6ca1\\u60f3\\u5230\\u5b83\\u4e24\\u53e3\\u5c31\\u5403\\u5b8c\\u4e86\\uff0c\\u5927\\u5bb6\\u8d76\\u7d27\\u53c8\\u62ff\\u51fa\\u4e00\\u4e9b\\u98df\\u7269\\u5582\\u7ed9\\u5b83\\u3002\",\"\\u56e0\\u4e3a\\u4e0d\\u77e5\\u9053\\u5b83\\u662f\\u4ec0\\u4e48\\u52a8\\u7269\\uff0c\\u4f17\\u4eba\\u51b3\\u5b9a\\u5148\\u5c06\\u5b83\\u6293\\u8d77\\u6765\\uff0c\\u88c5\\u5230\\u888b\\u5b50\\u91cc\\u540e\\uff0c\\u6e38\\u5ba2\\u53d1\\u73b0\\u5b83\\u7684\\u5934\\u9aa8\\u5e76\\u4e0d\\u662f\\u5f88\\u5927\\uff0c\\u53ea\\u662f\\u6574\\u4e2a\\u4e0b\\u5df4\\u662f\\u80bf\\u7684\\uff0c\\u5927\\u5bb6\\u9001\\u5230\\u52a8\\u7269\\u6551\\u52a9\\u7ad9\\u68c0\\u67e5\\u4e00\\u4e0b\\u518d\\u505a\\u6253\\u7b97\\u3002\",\"\\u5230\\u4e86\\u6551\\u52a9\\u7ad9\\u7ecf\\u8fc7\\u517d\\u533b\\u68c0\\u67e5\\u53d1\\u73b0\\u5b83\\u662f\\u4e00\\u6761\\u6d41\\u6d6a\\u72d7\\uff0c\\u4e0b\\u5df4\\u80bf\\u80c0\\u662f\\u56e0\\u4e3a\\u7ec6\\u83cc\\u611f\\u67d3\\u5bfc\\u81f4\\u4e0b\\u5df4\\u91cc\\u8fb9\\u53d1\\u708e\\u5bfc\\u81f4\\u7684\\u3002\\u517d\\u533b\\u8d76\\u5feb\\u5582\\u5b83\\u8fdb\\u884c\\u4e86\\u624b\\u672f\\u6392\\u8113\\u3002\",\"\\u7ecf\\u8fc7\\u5728\\u6551\\u52a9\\u7ad9\\u8c03\\u517b\\uff0c\\u6d41\\u6d6a\\u72d7\\u5f88\\u5feb\\u6062\\u590d\\u4e86\\u5065\\u5eb7\\uff0c\\u62c5\\u5fc3\\u5b83\\u5728\\u5916\\u8fb9\\u518d\\u6b21\\u53d7\\u5230\\u4f24\\u5bb3\\uff0c\\u6551\\u52a9\\u4eba\\u5458\\u51b3\\u5b9a\\u4e3a\\u5b83\\u627e\\u4e2a\\u9886\\u517b\\u4eba\\u3002\\u5e0c\\u671b\\u5b83\\u80fd\\u5c3d\\u5feb\\u7684\\u597d\\u8d77\\u6765\\u3002\"],\"sub_titles\":[\"\\u53d1\\u73b0\\u4e0d\\u77e5\\u540d\\u52a8\\u7269 \\u770b\\u5230\\u4eba\\u4e3b\\u52a8\\u4eb2\\u8fd1 \\u9001\\u53bb\\u6551\\u52a9\\u53d1\\u73b0\\u662f\\u6700\\u5fe0\\u8bda\\u7684\\u4f19\\u4f34\",\"\\u53d1\\u73b0\\u4e0d\\u77e5\\u540d\\u52a8\\u7269 \\u770b\\u5230\\u4eba\\u4e3b\\u52a8\\u4eb2\\u8fd1 \\u9001\\u53bb\\u6551\\u52a9\\u53d1\\u73b0\\u662f\\u6700\\u5fe0\\u8bda\\u7684\\u4f19\\u4f34\",\"\\u53d1\\u73b0\\u4e0d\\u77e5\\u540d\\u52a8\\u7269 \\u770b\\u5230\\u4eba\\u4e3b\\u52a8\\u4eb2\\u8fd1 \\u9001\\u53bb\\u6551\\u52a9\\u53d1\\u73b0\\u662f\\u6700\\u5fe0\\u8bda\\u7684\\u4f19\\u4f34\",\"\\u53d1\\u73b0\\u4e0d\\u77e5\\u540d\\u52a8\\u7269 \\u770b\\u5230\\u4eba\\u4e3b\\u52a8\\u4eb2\\u8fd1 \\u9001\\u53bb\\u6551\\u52a9\\u53d1\\u73b0\\u662f\\u6700\\u5fe0\\u8bda\\u7684\\u4f19\\u4f34\",\"\\u53d1\\u73b0\\u4e0d\\u77e5\\u540d\\u52a8\\u7269 \\u770b\\u5230\\u4eba\\u4e3b\\u52a8\\u4eb2\\u8fd1 \\u9001\\u53bb\\u6551\\u52a9\\u53d1\\u73b0\\u662f\\u6700\\u5fe0\\u8bda\\u7684\\u4f19\\u4f34\",\"\\u53d1\\u73b0\\u4e0d\\u77e5\\u540d\\u52a8\\u7269 \\u770b\\u5230\\u4eba\\u4e3b\\u52a8\\u4eb2\\u8fd1 \\u9001\\u53bb\\u6551\\u52a9\\u53d1\\u73b0\\u662f\\u6700\\u5fe0\\u8bda\\u7684\\u4f19\\u4f34\"]}"
P.S。我发现了一个额外的“最后,所以我尝试了这个
data = result.group(1)
revised = data.replace('\"', '"').replace('\\','').replace('^[\"]*','').replace('[\"]*$','')
load = json.loads(revised)
从技术上讲,我可以删除“但我不能。 我还得到额外数据:第1行第4列(字符3)
答案 0 :(得分:0)
是的,你的问题是你的json结构,它不应该有反斜杠,所以你需要在加载之前更换它们才是你的解决方案
load = json.loads(data.replace('\"', '"').replace('\\','').replace(']}"',']}'))
希望这有帮助。