Question

背景信息 我有一个XML文件供我的供应商每晚上传新产品和更新的库存数量等。但是他们把我拼了起来，他们在XML文件中没有描述，他们有一个链接到他们的网站，其中有原始文本的描述。

我需要做的是有一个脚本循环遍历从我们下载的文档，并将URL替换为URL的内容。

例如，如果我有

<DescriptionLink>http://www.leadersystems.com.au/DataFeed/ProductDetails/AT-CHARGERSTATION-45</DescriptionLink>

我希望它最终成为

<DescriptionLink>Astrotek USB Charging Station Charger Hub 3 Port 5V 4A with 1.5m Power Cable White for iPhone Samsung iPad Tablet GPS</DescriptionLink>

我尝试过一些东西，但我对脚本或循环不是很熟练。到目前为止我已经：

#!/bin/bash
LINKGET=`awk -F '|' '{ print $2 }' products-daily.txt`

wget -O products-daily.txt http://www.suppliers-site-url.com
sed 's/<DescriptionLink>*/<DescriptionLink>$(wget -S -O- $LINKGET/g' products-daily.txt

但同样，我不确定这一切是如何真正起作用的，所以这是反复试验。任何帮助表示赞赏!!!

已更新，包含示例网址。

Answer 1

你会想要这样的东西（使用GNU awk为第3个arg匹配（））：

$ cat tst.awk
{
    head = ""
    tail = encode($0)
    while ( match(tail,/^([^{]*[{])([^}]+)(.*)/,a) ) {
        desc = ""
        cmd = "curl -s \047" a[2] "\047"
        while ( (cmd | getline line) > 0 ) {
            desc = (desc=="" ? "" : desc ORS) line
        }
        close(cmd)
        head = head decode(a[1]) desc
        tail = a[3]
    }
    print head decode(tail)
}
function encode(str) {
    gsub(/@/,"@A",str)
    gsub(/{/,"@B",str)
    gsub(/}/,"@C",str)
    gsub(/<DescriptionLink>/,"{",str)
    gsub(/<\/DescriptionLink>/,"}",str)
    return str
}
function decode(str) {
    gsub(/}/,"</DescriptionLink>",str)
    gsub(/{/,"<DescriptionLink>",str)
    gsub(/@C/,"}",str)
    gsub(/@B/,"{",str)
    gsub(/@A/,"@",str)
    return str
}

$ awk -f tst.awk file
<DescriptionLink>Astrotek USB Charging Station Charger Hub 3 Port 5V 4A with 1.5m Power Cable White for iPhone Samsung iPad Tablet GPS</DescriptionLink>

有关编码/解码功能正在做什么以及原因的信息，请参阅https://stackoverflow.com/a/40512703/1745001。

请注意，这是少数使用getline的情况之一。如果您以后考虑过使用getline，请务必先阅读并完全理解http://awk.freeshell.org/AllAboutGetline中讨论的所有注意事项和用例。

使用URL中的内容查找和替换URL

1 个答案: