这是我过去4天一直在努力解决的问题。我阅读谷歌和SOF的教程,但没有人可以帮助我。我把它作为一个问题扔出去,以便其他人可以尝试并帮助我解决它。我已经用粗略的方法解决了它,但想想是否有更聪明的方法。所以有一个包含滚珠轴承列表及其属性的文件。它看起来像这样:
<li class="odd first">
<a href="/productcatalogue/prodlink.html?lang=en&imperial=false&prodid=1310003030&pubid=21&WT.oss=&WT.z_oss_boost=0&WT.z_oss_ref=ProductSearch&WT.z_oss_rank=1">33030</a>
|<strong>Product category: </strong> <a href="/productcatalogue/prodlink.html?lang=en&imperial=false&prodid=1310003030&pubid=21&WT.oss=&WT.z_oss_boost=0&WT.z_oss_ref=ProductSearch&WT.z_oss_rank=1&isTableView=true" class="product-table-link">Tapered roller bearings single row</a>
|<strong>Width: </strong> 59 mm
|<strong>Bore diameter: </strong> 150 mm
|<strong>Outside diameter: </strong> 225 mm
|<strong>Source: </strong> -
|<strong>Limiting speed: </strong> 2600 r/min
|<strong>Reference speed: </strong> 2000 r/min
</li>
<li class="even ">
<a href="/productcatalogue/prodlink.html?lang=en&imperial=false&prodid=1310000230&pubid=21&WT.oss=&WT.z_oss_boost=0&WT.z_oss_ref=ProductSearch&WT.z_oss_rank=2">30230</a>
|<strong>Product category: </strong> <a href="/productcatalogue/prodlink.html?lang=en&imperial=false&prodid=1310000230&pubid=21&WT.oss=&WT.z_oss_boost=0&WT.z_oss_ref=ProductSearch&WT.z_oss_rank=2&isTableView=true" class="product-table-link">Tapered roller bearings single row</a>
|<strong>Width: </strong> 49 mm
|<strong>Bore diameter: </strong> 150 mm
|<strong>Outside diameter: </strong> 270 mm
|<strong>Source: </strong> -
|<strong>Limiting speed: </strong> 2400 r/min
|<strong>Reference speed: </strong> 1800 r/min
</li>
<li class="odd ">
<a href="/productcatalogue/prodlink.html?lang=en&imperial=false&prodid=1310003024&pubid=21&WT.oss=&WT.z_oss_boost=0&WT.z_oss_ref=ProductSearch&WT.z_oss_rank=3">33024</a>
|<strong>Product category: </strong> <a href="/productcatalogue/prodlink.html?lang=en&imperial=false&prodid=1310003024&pubid=21&WT.oss=&WT.z_oss_boost=0&WT.z_oss_ref=ProductSearch&WT.z_oss_rank=3&isTableView=true" class="product-table-link">Tapered roller bearings single row</a>
|<strong>Width: </strong> 48 mm
|<strong>Bore diameter: </strong> 120 mm
|<strong>Outside diameter: </strong> 180 mm
|<strong>Source: </strong> -
|<strong>Limiting speed: </strong> 3400 r/min
|<strong>Reference speed: </strong> 2600 r/min
</li>
<li class="even ">
<a href="/productcatalogue/prodlink.html?lang=en&imperial=false&prodid=1310003022&pubid=21&WT.oss=&WT.z_oss_boost=0&WT.z_oss_ref=ProductSearch&WT.z_oss_rank=4">33022</a>
|<strong>Product category: </strong> <a href="/productcatalogue/prodlink.html?lang=en&imperial=false&prodid=1310003022&pubid=21&WT.oss=&WT.z_oss_boost=0&WT.z_oss_ref=ProductSearch&WT.z_oss_rank=4&isTableView=true" class="product-table-link">Tapered roller bearings single row</a>
|<strong>Width: </strong> 47 mm
|<strong>Bore diameter: </strong> 110 mm
|<strong>Outside diameter: </strong> 170 mm
|<strong>Source: </strong> -
|<strong>Limiting speed: </strong> 3600 r/min
|<strong>Reference speed: </strong> 2600 r/min
</li>
<li class="odd ">
<a href="/productcatalogue/prodlink.html?lang=en&imperial=false&prodid=1310003220&pubid=21&WT.oss=&WT.z_oss_boost=0&WT.z_oss_ref=ProductSearch&WT.z_oss_rank=5">33220</a>
|<strong>Product category: </strong> <a href="/productcatalogue/prodlink.html?lang=en&imperial=false&prodid=1310003220&pubid=21&WT.oss=&WT.z_oss_boost=0&WT.z_oss_ref=ProductSearch&WT.z_oss_rank=5&isTableView=true" class="product-table-link">Tapered roller bearings single row</a>
|<strong>Width: </strong> 63 mm
|<strong>Bore diameter: </strong> 100 mm
|<strong>Outside diameter: </strong> 180 mm
|<strong>Source: </strong> -
|<strong>Limiting speed: </strong> 3600 r/min
|<strong>Reference speed: </strong> 2400 r/min
</li>
现在,如果你看一下HTML的响应(而不是html本身)。我想解析它,提取href链接中的参数(在第一个条目中,href链接中有prodid参数,prodid = 1310003030)。如果可能的话,我想在每一行的末尾追加整个链接。
我想提取它并在EACH行的末尾追加,以便条目看起来像这样。
33030 |Product category: Tapered roller bearings single row |Width: 59 mm |Bore diameter: 150 mm |Outside diameter: 225 mm |Source: - |Limiting speed: 2600 r/min |Reference speed: 2000 r/min | 1310003030
30230 |Product category: Tapered roller bearings single row |Width: 49 mm |Bore diameter: 150 mm |Outside diameter: 270 mm |Source: - |Limiting speed: 2400 r/min |Reference speed: 1800 r/min | 1310000230
33024 |Product category: Tapered roller bearings single row |Width: 48 mm |Bore diameter: 120 mm |Outside diameter: 180 mm |Source: - |Limiting speed: 3400 r/min |Reference speed: 2600 r/min | 1310003024
33022 |Product category: Tapered roller bearings single row |Width: 47 mm |Bore diameter: 110 mm |Outside diameter: 170 mm |Source: - |Limiting speed: 3600 r/min |Reference speed: 2600 r/min | 1310003022
答案 0 :(得分:0)
这是sed版本。我必须承认,使用sed
在不同的行上交换单词的顺序并不容易;
sed -nre '
/^ *<a/{
h;s/^.*prodid=([0-9]+).*$/ |\1/;x;s_^.*>([0-9]+)</a.*$_\1_
:back
N
s/\n.*(Product category:).*\">(.*)<.*$/ |\1 \2/
s_\n.*strong>(.*)</strong>(.*)$_ |\1 \2_
/<\/li>$/ !bback
/<\/li>$/ {
s/<\/li>$//;G;s/\n//g;s/ */ /g;p
}
}
' file
答案 1 :(得分:0)
用于通用文本操作的UNIX工具是awk
:
$ cat tst.awk
BEGIN {
FS = "[[:space:]]*<[^>]+>[[:space:]]*"
OFS = " |"
}
/^[[:space:]]*<a href/{
split($0,a,/.*prodid=|&.*/)
prodid = a[2]
prodnr = $(NF-1)
}
/<strong>/ {
name = $2
value = ($NF == "" ? $(NF-1) : $NF)
sub(/[[:space:]]+$/,"",value)
n2v[name] = value
if (!seen[name]++) {
names[++numNames] = name
}
}
/<\/li>/ {
printf "%s%s", prodnr, OFS
for (nameNr=1; nameNr<=numNames; nameNr++) {
name = names[nameNr]
value = n2v[name]
printf "%s %s%s", name, value, OFS
}
print " " prodid
}
$ awk -f tst.awk file
33030 |Product category: Tapered roller bearings single row |Width: 59 mm |Bore diameter: 150 mm |Outside diameter: 225 mm |Source: - |Limiting speed: 2600 r/min |Reference speed: 2000 r/min | 1310003030
30230 |Product category: Tapered roller bearings single row |Width: 49 mm |Bore diameter: 150 mm |Outside diameter: 270 mm |Source: - |Limiting speed: 2400 r/min |Reference speed: 1800 r/min | 1310000230
33024 |Product category: Tapered roller bearings single row |Width: 48 mm |Bore diameter: 120 mm |Outside diameter: 180 mm |Source: - |Limiting speed: 3400 r/min |Reference speed: 2600 r/min | 1310003024
33022 |Product category: Tapered roller bearings single row |Width: 47 mm |Bore diameter: 110 mm |Outside diameter: 170 mm |Source: - |Limiting speed: 3600 r/min |Reference speed: 2600 r/min | 1310003022
33220 |Product category: Tapered roller bearings single row |Width: 63 mm |Bore diameter: 100 mm |Outside diameter: 180 mm |Source: - |Limiting speed: 3600 r/min |Reference speed: 2400 r/min | 1310003220