Question

我写了一个从instagram获取用户名的代码。有时我的算法无法正常工作，因此我将其命名为“ p”。我正在尝试为此异常编写代码（在if head =='p'：部分代码中）。首先，我使用汤。选择以获取此信息块：

Blockquote
{“ @context”：“ http：\ / \ / schema.org”，“ @ type”：“ ImageObject”，“ caption”：“”我认为我在编辑这些文件方面会变得更好....意味着越来越混乱，我不认为人们会说出他们即将推出的东西。...这并不是我真正想要的，但是哦。\ n- \ n \ u2022September 3 2018 \ u2022 \ n- \ n- \ n这是我在腿上割伤时提到的更新。两天后我终于将它们清洗了。我通常不会等那么长时间，但确实没有合适的环境*在耸肩* \ n- \ n- \ n- \ n＃selfharm #selfharmo“，” representativeOfPage“：” http：\ / \ / schema.org \ / True“， “ uploadDate”：“ 2018-09-04T06：27：24”，“ author”：{“ @ type”：“ Person”，**“ alternateName”：“ @ alittlereddrop” **，“ mainEntityofPage”：{“ @类型“：” ProfilePage“，” @ id“：” https：\ / \ / www.instagram.com \ / alittlereddrop \ /“}}，” commentCount“：” 0“，” interactionStatistic“：{” @ type“ ：“ InteractionCounter”，“ interactionType”：{“ @ type”：“ LikeAction”}，“ userI nteractionCount“：” 2“}，” mainEntityofPage“：{” @ type“：” ItemPage“，” @ id“：” https：\ / \ / www.instagram.com \ / p \ / BnS0sdDlsmP \ /？tagged = selfharmo“}，” description“：” 2赞，0条评论-Instagram上没有人在乎（@alittlereddrop）：\ u201c我认为我在编辑这些内容方面越来越好...。我的意思是说，越来越多more \ u2026 \ u201d“，” name“：” Instagram上没有人在乎：\ u201c我认为我在编辑这些内容方面变得更好。...我的意思是说我越来越模糊了。请勿认为\ u2026 \ u201d“}
Blockquote

有一个部分“ alternateName”：其中包含一个名称。但是即使使用json.loads我也无法获得它。你有什么想法吗？

file = open('users.txt', 'r', encoding="ISO-8859-1")
urls = file.readlines()
for url in urls:
url = url.strip ('\n')
try:
    req = requests.get(url)
    req.raise_for_status()
except HTTPError as http_err:
    output = open('output2.txt', 'a')
    output.write(f'К сожалению страница недоступна.\n')  
except Exception as err:
    output = open('output2.txt', 'a')
    output.write(f'К сожалению страница недоступна2\n')  
else:
    output = open('output2.txt', 'a')
    soup = BeautifulSoup(req.text, "lxml")
    the_url = soup.select("[rel='canonical']")[0]['href']
    the_url2=the_url.replace('https://www.instagram.com/','')
    head, sep, tail = the_url2.partition('/')
    if head == 'p':
        data = soup.select("[type='application/ld+json']")[0]
        oJson2 = json.loads(data.text)["alternateName"]
        str (oJson2)
        output.write (oJson2+'\n')
    else: 
        output.write (head+'\n')

Answer 1

您的json文件中的语法存在问题。在两个位置错误地放置了双星：

**"alternateName":"@alittlereddrop"**,。

如果要从文件中打开json，请执行以下操作：

import json

with open('yourfilename.json') as fo:
    jsn = json.loads(fo.read().replace('**', ''))
print(jsn['author']['alternateName'])
# '@alittlereddrop'

在您的情况下，请尝试代替此行：

oJson2 = json.loads(data.text)["alternateName"]

此

oJson2 = json.loads(data.text.replace('**', ''))['author']["alternateName"]

如何在Python中使用json.loads获取文本

1 个答案: