我正在尝试从链接中抓取有关记者的信息,并将每个变量(名称,国家/地区,死亡类型等)保存为csv文件中的一列。我怎样才能做到这一点?链接为https://cpj.org/data/people/abadullah-hananzai/index.php
答案 0 :(得分:0)
该网站正在使用AJAX从JSON格式的不同URL加载数据。这可以帮助您入门:
import requests
from operator import itemgetter
import csv
data_url = 'https://cpj.org/api/datamanager/killed?source=http://cpj.org/data/people/abadullah_hananzai/'
data = [*zip(*map(itemgetter('tag', 'value'), requests.get(data_url).json()))]
with open('out.csv', 'w', newline='') as csvfile:
csv_writer = csv.writer(csvfile, delimiter=',',
quotechar='"', quoting=csv.QUOTE_MINIMAL)
csv_writer.writerows(data)
out.csv
的输出:
currentStatus,type,lastStatus,motiveConfirmed,fullName,localOrForeign,mediums,jobs,coverages,gender,employedAs,typeOfDeath,captive,sourcesOfFire,tortured,impunity,threatened,status,organizations,freelance,country,locality,date,subheading
Killed,Journalist,Killed,Confirmed,Abadullah Hananzai,Local,"Radio, Internet",Producer,"Crime, Politics, War",Male,Staff,Murder,No,Political Group,No,Complete Impunity,No,Killed,"Radio Azadi,Radio Free Europe/Radio Liberty",No,Afghanistan,Kabul,"April 30, 2018","Radio Azadi,Radio Free Europe/Radio Liberty | Killed in Kabul, Afghanistan | April 30, 2018"
在LibreOffice中,它看起来像这样: