我正在网上抓取一个网站。当我抓取一个URL时,我将其写入字典。我想要做的是将每个字典写入json文件。当我执行以下循环时,文件将保存为非列表形式,而不是列表形式,而不是可读的结构{} {}。
df_price_m = {}
with open(r"C:\Users\USER\Desktop\diploma\information.json", 'w', encoding='utf8') as fout:
row = 0
for url in data:
row +=1
driver.get(url)
user_name_xpath = "//h1[@itemprop='name' and @data-shmid='profilePrepName']"
user_name = get_elements(user_name_xpath)
user_about_xpath = "//*[@class='desktop-profile-page__about-text']"
user_about = get_elements(user_about_xpath)
df_info['id'] = url
df_info['user_name'] = user_name[0]
df_info['user_about'] = user_about[0]
json.dump(df_price_m, fout, ensure_ascii=False)
我得到以下json:
{"id": "www.aina.com", user_name: "Aina Nurma", "user_about": "I am a student"}
{"id": "www.aina.ru", user_name: "Aina Nur", "user_about": "I am a teacher"}
答案 0 :(得分:0)
看起来您丢失了一些代码,但是我建议将所有数据保存为字典列表,然后将其转储到末尾,而不是转储到仅处理了一个URL的文件中