如何在bash中更快地操纵此CSV?

时间:2017-10-11 21:08:38

标签: bash csv

我有一个名为the_list = entrada.split(" ") # take input & make a list of all values, separated by " " saida = the_list.join(", ") # join all elements with ", "

的CSV数据
brands_url

我想使用第二列品牌的值来使用此命令行查找该品牌的域名

"relative/url","brand"
"relative/url1","brand"

我希望使用该结果作为第一列的前置,以便最终结果将是这样的。

curl url.json | jq -r '.[] | select(.slug=="brand") | .domain.production' # this would produce >> www.domain.com

我的脚本现在的问题是它很慢。

"www.domain.com/relative/url"
"www.domain.com/relative/url1"

BRAND_JSON=$(curl url.json) while IFS= read -r line do BRAND=$(echo $line | awk -F',' '{print $2}' | sed "s/\"//g") URI=$(echo $line | awk -F',' '{print $1}' | sed "s/\"//g") echo $BRAND DOMAIN=$(echo $BRAND_JSON | jq -r ".[] | select(.slug==\"$BRAND\") | .domain.production") echo $DOMAIN echo $URI echo "https://$DOMAIN/$URI" >> urls done < "brand_urls" 的内容如下所示

$BRAND_JSON

2 个答案:

答案 0 :(得分:2)

只需使用带有子串删除的参数扩展,即可消除80%的子shell开销。您可以简单地通过让bash句柄解析这些行来替换对awksed(以及每个'|'所需的子shell)的4个调用,例如

while IFS= read -r line
do
    BRAND=${line%\"}
    BRAND=${BRAND##*\"}
    URI=${line#\"}
    URI=${URI%%\"*}
    echo $BRAND
    DOMAIN=$(echo $BRAND_JSON | jq -r ".[] | select(.slug==\"$BRAND\") | \
    .domain.production")
    echo $DOMAIN
    echo $URI
    echo "https://$DOMAIN/$URI" >> urls
done < "brand_urls"

尝试一下让我知道。剩下的大部分时间都在curl的外部信息检索中,bash对此无能为力。

答案 1 :(得分:1)

jq + awk 工具的简短组合:

示例url.json(应该是有效的json):

[
{
 "slug": "brand",
 "domain": {
    "production": "www.domain.com"
  }
}, 
{
 "slug": "brand1",
 "domain": {
    "production": "www.domain1.com"
 }
}
]

示例brands_urls.csv内容:

"relative/url","brand"
"relative/url1","brand1"

工作:

awk -F, 'NR==FNR{ gsub(/"/,""); a[$2]=$1;next }
         $2 in a{ printf "https://%s/%s\n",$1,a[$2] }' brands_urls.csv \
         FS='\t' <(jq -r '.[] | [.domain.production,.slug] | @tsv' url.json)

输出(反斜杠befor \ domain 是故意添加的,因为SO不允许明确地粘贴www.domain.com代码。实际输出会很好):

https://www.\domain.com/relative/url
https://www.\domain1.com/relative/url1