如何从链接中获取信息并将其保存在csv文件中?

时间:2019-07-13 02:28:37

标签: python web-scraping beautifulsoup

我正在尝试从链接中抓取有关记者的信息,并将每个变量(名称,国家/地区,死亡类型等)保存为csv文件中的一列。我怎样才能做到这一点?链接为https://cpj.org/data/people/abadullah-hananzai/index.php

1 个答案:

答案 0 :(得分:0)

该网站正在使用AJAX从JSON格式的不同URL加载数据。这可以帮助您入门:

import requests
from operator import itemgetter
import csv

data_url = 'https://cpj.org/api/datamanager/killed?source=http://cpj.org/data/people/abadullah_hananzai/'
data = [*zip(*map(itemgetter('tag', 'value'), requests.get(data_url).json()))]

with open('out.csv', 'w', newline='') as csvfile:
    csv_writer = csv.writer(csvfile, delimiter=',',
                            quotechar='"', quoting=csv.QUOTE_MINIMAL)
    csv_writer.writerows(data)

out.csv的输出:

currentStatus,type,lastStatus,motiveConfirmed,fullName,localOrForeign,mediums,jobs,coverages,gender,employedAs,typeOfDeath,captive,sourcesOfFire,tortured,impunity,threatened,status,organizations,freelance,country,locality,date,subheading
Killed,Journalist,Killed,Confirmed,Abadullah Hananzai,Local,"Radio, Internet",Producer,"Crime, Politics, War",Male,Staff,Murder,No,Political Group,No,Complete Impunity,No,Killed,"Radio Azadi,Radio Free Europe/Radio Liberty",No,Afghanistan,Kabul,"April 30, 2018","Radio Azadi,Radio Free Europe/Radio Liberty | Killed in Kabul, Afghanistan | April 30, 2018"

在LibreOffice中,它看起来像这样:

enter image description here