我有一个成功查询API的脚本,但速度很慢。获得所有资源大约需要16个小时。我看了一下如何优化它,我认为使用GNU parallels(通过Brew安装在macos上,版本20180522)就可以了。但即使使用90个作业(API端点最多授权100个连接),我的脚本也不会更快。我不确定为什么。
我这样称呼我的脚本:
bash script.sh | parallel -j90
脚本如下:
#!bin/bash
# This script downloads the list of French MPs who contributed to a specific amendment.
# The script is initialised with a file containing a list of API URLs, each pointing to a resource describing an amendment
# The main function loops over 3 actions:
# 1. assign to $sign the API url that points to the list of amendment authors
# 2. run the functions auteur and cosignataires and save them in their respective variables
# 3. merge the variable contents and append them as a new line into a csv file
main(){
local file="${1}"
local line
local sign
local auteur_clean
local cosign_clean
while read line
do
sign="${line}/signataires"
auteur_clean=$(auteur $sign)
cosign_clean=$(cosignataires $sign)
echo "${auteur_clean}","${cosign_clean}" >> signataires_15.csv
done < "${file}"
}
# The auteur function takes the $sign variable as an input and
# 1. filters the json returned by the API to get only the author's ID
# 2.use the ID stored in $auteur to query the full author resource and capture the key info, which is then assigned to $auteur_nom
# 3. echo a cleaned version of the info stored in $auteur_nom
auteur(){
local url="${1}"
local auteur
local auteur_nom
auteur=$(curl -s "${url}" | jq '.signataires[] | select(.relation=="auteur") | .id') \
&& auteur_nom=$(curl -s "https://www.parlapi.fr/rest/an/acteurs_amendements/${auteur}" \
| jq -r --arg url "https://www.parlapi.fr/rest/an/acteurs_amendements/${auteur}" '$url, .amendement.id, .acteur.id, (.acteur.prenom + " " + .acteur.nom)') \
&& echo "${auteur_nom}" | tr '\n' ',' | sed 's/,$//'
}
# The cosignataires function takes the $sign variable as an input and
# 1. filter the json returned by the API to produce a space separated list of co-authors
# 2. iterates over list of coauthors to get their name and surname, and assign the resulting list to $cosign_nom
# 3. echo a semi-colon separated list of the co-author names
cosignataires(){
local url="${1}"
local cosign
local cosign_nom
local i
cosign=$(curl -s "${url}" | jq '.signataires[] | select(.relation=="cosignataire") | .id' | tr '\n' ' ') \
&& cosign_nom=$(for i in ${cosign}; do curl -s "https://www.parlapi.fr/rest/an/acteurs_amendements/${i}" | jq -r '(.acteur.prenom + " " + .acteur.nom)'; done) \
&& echo "${cosign_nom}" | tr '\n' ';' | sed 's/,$//'
}
main "url_amendements_15.txt"
,url_amendements_15.txt
的内容如下:
https://www.parlapi.fr/rest/an/amendements/AMANR5L15SEA717460BTC0174P0D1N7
https://www.parlapi.fr/rest/an/amendements/AMANR5L15PO59051B0490P0D1N90
https://www.parlapi.fr/rest/an/amendements/AMANR5L15PO59051B0490P0D1N134
https://www.parlapi.fr/rest/an/amendements/AMANR5L15PO59051B0490P0D1N187
https://www.parlapi.fr/rest/an/amendements/AMANR5L15PO59051B0490P0D1N161
答案 0 :(得分:3)
您的脚本循环遍历URL列表并按顺序查询它们。您需要将其分解,以便每个API查询单独完成,这样using System.Collections.ObjectModel;
using System.ComponentModel;
using System.Runtime.CompilerServices;
namespace TreeViewDropShadowExampl
{
public class Node : INotifyPropertyChanged
{
#region WPF integration properties
public event PropertyChangedEventHandler PropertyChanged;
protected virtual void OnPropertyChanged([CallerMemberName] string propertyName = null)
{
PropertyChanged?.Invoke(this, new PropertyChangedEventArgs(propertyName));
}
#endregion WPF integration properties
public Node(string id)
{
ID = id;
}
private string _id;
public string ID
{
get { return _id; }
set
{
_id = value;
// Call OnPropertyChanged whenever the property is updated
OnPropertyChanged();
}
}
public ObservableCollection<Node> Children { get; set; } = new ObservableCollection<Node>();
}
}
将具有可以并行执行的命令。
更改脚本以使其占用一个网址。摆脱主parallel
循环。
while
然后将main() {
local url=$1
local sign
local auteur_clean
local cosign_clean
sign=$url/signataires
auteur_clean=$(auteur "$sign")
cosign_clean=$(cosignataires "$sign")
echo "$auteur_clean,$cosign_clean" >> signataires_15.csv
}
传递给url_amendements_15.txt
。提供 it 可以并行处理的URL列表。
parallel