匹配模式,将其保存在变量中,并使用sed / awk / grep将其附加到行尾

时间:2014-12-13 12:01:09

标签: awk sed

这是我过去4天一直在努力解决的问题。我阅读谷歌和SOF的教程,但没有人可以帮助我。我把它作为一个问题扔出去,以便其他人可以尝试并帮助我解决它。我已经用粗略的方法解决了它,但想想是否有更聪明的方法。所以有一个包含滚珠轴承列表及其属性的文件。它看起来像这样:

<li class="odd  first">
     <a href="/productcatalogue/prodlink.html?lang=en&amp;imperial=false&amp;prodid=1310003030&amp;pubid=21&amp;WT.oss=&amp;WT.z_oss_boost=0&amp;WT.z_oss_ref=ProductSearch&amp;WT.z_oss_rank=1">33030</a>
    |<strong>Product category: </strong> <a href="/productcatalogue/prodlink.html?lang=en&amp;imperial=false&amp;prodid=1310003030&amp;pubid=21&amp;WT.oss=&amp;WT.z_oss_boost=0&amp;WT.z_oss_ref=ProductSearch&amp;WT.z_oss_rank=1&amp;isTableView=true" class="product-table-link">Tapered roller bearings single row</a>

        |<strong>Width: </strong> 59 mm
        |<strong>Bore diameter: </strong> 150 mm
        |<strong>Outside diameter: </strong> 225 mm
        |<strong>Source: </strong> -


        |<strong>Limiting speed: </strong> 2600 r/min
        |<strong>Reference speed: </strong> 2000 r/min


</li>
<li class="even ">
     <a href="/productcatalogue/prodlink.html?lang=en&amp;imperial=false&amp;prodid=1310000230&amp;pubid=21&amp;WT.oss=&amp;WT.z_oss_boost=0&amp;WT.z_oss_ref=ProductSearch&amp;WT.z_oss_rank=2">30230</a>
    |<strong>Product category: </strong> <a href="/productcatalogue/prodlink.html?lang=en&amp;imperial=false&amp;prodid=1310000230&amp;pubid=21&amp;WT.oss=&amp;WT.z_oss_boost=0&amp;WT.z_oss_ref=ProductSearch&amp;WT.z_oss_rank=2&amp;isTableView=true" class="product-table-link">Tapered roller bearings single row</a>

        |<strong>Width: </strong> 49 mm
        |<strong>Bore diameter: </strong> 150 mm
        |<strong>Outside diameter: </strong> 270 mm
        |<strong>Source: </strong> -


        |<strong>Limiting speed: </strong> 2400 r/min
        |<strong>Reference speed: </strong> 1800 r/min


</li>
<li class="odd  ">
     <a href="/productcatalogue/prodlink.html?lang=en&amp;imperial=false&amp;prodid=1310003024&amp;pubid=21&amp;WT.oss=&amp;WT.z_oss_boost=0&amp;WT.z_oss_ref=ProductSearch&amp;WT.z_oss_rank=3">33024</a>
    |<strong>Product category: </strong> <a href="/productcatalogue/prodlink.html?lang=en&amp;imperial=false&amp;prodid=1310003024&amp;pubid=21&amp;WT.oss=&amp;WT.z_oss_boost=0&amp;WT.z_oss_ref=ProductSearch&amp;WT.z_oss_rank=3&amp;isTableView=true" class="product-table-link">Tapered roller bearings single row</a>

        |<strong>Width: </strong> 48 mm
        |<strong>Bore diameter: </strong> 120 mm
        |<strong>Outside diameter: </strong> 180 mm
        |<strong>Source: </strong> -


        |<strong>Limiting speed: </strong> 3400 r/min
        |<strong>Reference speed: </strong> 2600 r/min


</li>
<li class="even ">
     <a href="/productcatalogue/prodlink.html?lang=en&amp;imperial=false&amp;prodid=1310003022&amp;pubid=21&amp;WT.oss=&amp;WT.z_oss_boost=0&amp;WT.z_oss_ref=ProductSearch&amp;WT.z_oss_rank=4">33022</a>
    |<strong>Product category: </strong> <a href="/productcatalogue/prodlink.html?lang=en&amp;imperial=false&amp;prodid=1310003022&amp;pubid=21&amp;WT.oss=&amp;WT.z_oss_boost=0&amp;WT.z_oss_ref=ProductSearch&amp;WT.z_oss_rank=4&amp;isTableView=true" class="product-table-link">Tapered roller bearings single row</a>

        |<strong>Width: </strong> 47 mm
        |<strong>Bore diameter: </strong> 110 mm
        |<strong>Outside diameter: </strong> 170 mm
        |<strong>Source: </strong> -


        |<strong>Limiting speed: </strong> 3600 r/min
        |<strong>Reference speed: </strong> 2600 r/min


</li>
<li class="odd  ">
     <a href="/productcatalogue/prodlink.html?lang=en&amp;imperial=false&amp;prodid=1310003220&amp;pubid=21&amp;WT.oss=&amp;WT.z_oss_boost=0&amp;WT.z_oss_ref=ProductSearch&amp;WT.z_oss_rank=5">33220</a>
    |<strong>Product category: </strong> <a href="/productcatalogue/prodlink.html?lang=en&amp;imperial=false&amp;prodid=1310003220&amp;pubid=21&amp;WT.oss=&amp;WT.z_oss_boost=0&amp;WT.z_oss_ref=ProductSearch&amp;WT.z_oss_rank=5&amp;isTableView=true" class="product-table-link">Tapered roller bearings single row</a>

        |<strong>Width: </strong> 63 mm
        |<strong>Bore diameter: </strong> 100 mm
        |<strong>Outside diameter: </strong> 180 mm
        |<strong>Source: </strong> -


        |<strong>Limiting speed: </strong> 3600 r/min
        |<strong>Reference speed: </strong> 2400 r/min              
</li>

现在,如果你看一下HTML的响应(而不是html本身)。我想解析它,提取href链接中的参数(在第一个条目中,href链接中有prodid参数,prodid = 1310003030)。如果可能的话,我想在每一行的末尾追加整个链接。

我想提取它并在EACH行的末尾追加,以便条目看起来像这样。

33030 |Product category: Tapered roller bearings single row |Width: 59 mm |Bore diameter: 150 mm |Outside diameter: 225 mm |Source: - |Limiting speed: 2600 r/min |Reference speed: 2000 r/min | 1310003030 
30230 |Product category: Tapered roller bearings single row |Width: 49 mm |Bore diameter: 150 mm |Outside diameter: 270 mm |Source: - |Limiting speed: 2400 r/min |Reference speed: 1800 r/min | 1310000230 
33024 |Product category: Tapered roller bearings single row |Width: 48 mm |Bore diameter: 120 mm |Outside diameter: 180 mm |Source: - |Limiting speed: 3400 r/min |Reference speed: 2600 r/min | 1310003024 
33022 |Product category: Tapered roller bearings single row |Width: 47 mm |Bore diameter: 110 mm |Outside diameter: 170 mm |Source: - |Limiting speed: 3600 r/min |Reference speed: 2600 r/min | 1310003022

2 个答案:

答案 0 :(得分:0)

这是sed版本。我必须承认,使用sed在不同的行上交换单词的顺序并不容易;

sed -nre '
/^ *<a/{
    h;s/^.*prodid=([0-9]+).*$/ |\1/;x;s_^.*>([0-9]+)</a.*$_\1_
    :back
    N
    s/\n.*(Product category:).*\">(.*)<.*$/ |\1 \2/
    s_\n.*strong>(.*)</strong>(.*)$_ |\1 \2_
    /<\/li>$/ !bback
    /<\/li>$/ {
        s/<\/li>$//;G;s/\n//g;s/  */ /g;p
    }
}
' file

答案 1 :(得分:0)

用于通用文本操作的UNIX工具是awk

$ cat tst.awk
BEGIN {
    FS = "[[:space:]]*<[^>]+>[[:space:]]*"
    OFS = " |"
}

/^[[:space:]]*<a href/{
    split($0,a,/.*prodid=|&.*/)
    prodid = a[2]
    prodnr = $(NF-1)
}

/<strong>/ {
    name  = $2
    value = ($NF == "" ? $(NF-1) : $NF)
    sub(/[[:space:]]+$/,"",value)
    n2v[name] = value
    if (!seen[name]++) {
        names[++numNames] = name
    }
}

/<\/li>/ {
    printf "%s%s", prodnr, OFS
    for (nameNr=1; nameNr<=numNames; nameNr++) {
        name  = names[nameNr]
        value = n2v[name]
        printf "%s %s%s", name, value, OFS
    }
    print " " prodid
}

$ awk -f tst.awk file
33030 |Product category: Tapered roller bearings single row |Width: 59 mm |Bore diameter: 150 mm |Outside diameter: 225 mm |Source: - |Limiting speed: 2600 r/min |Reference speed: 2000 r/min | 1310003030
30230 |Product category: Tapered roller bearings single row |Width: 49 mm |Bore diameter: 150 mm |Outside diameter: 270 mm |Source: - |Limiting speed: 2400 r/min |Reference speed: 1800 r/min | 1310000230
33024 |Product category: Tapered roller bearings single row |Width: 48 mm |Bore diameter: 120 mm |Outside diameter: 180 mm |Source: - |Limiting speed: 3400 r/min |Reference speed: 2600 r/min | 1310003024
33022 |Product category: Tapered roller bearings single row |Width: 47 mm |Bore diameter: 110 mm |Outside diameter: 170 mm |Source: - |Limiting speed: 3600 r/min |Reference speed: 2600 r/min | 1310003022
33220 |Product category: Tapered roller bearings single row |Width: 63 mm |Bore diameter: 100 mm |Outside diameter: 180 mm |Source: - |Limiting speed: 3600 r/min |Reference speed: 2400 r/min | 1310003220