我需要在Powershell中从outerHTML
下面提取item-name,item-manufacturer,item-actual。
<DIV class=row>
<DIV class="col-sm-5 col-xs-8"><A class=item-name href="/details/drugs/39467/spasmonil-20mg">Spasmonil (20mg)</A>
<DIV class=text-small>2 ml</DIV>
<DIV class="item-manufacturer visible-xs">Cipla Limited</DIV></DIV>
<DIV class="col-sm-5 hidden-xs"><SPAN class=item-manufacturer>Cipla Limited</SPAN></DIV>
<DIV class="col-sm-2 col-xs-4 text-right">
<DIV class=item-actual>Rs. 6</DIV>
<DIV class=item-price>Rs. 6</DIV></DIV></DIV></LI>
<LI class="list-item item js-drug">
<DIV class=row>
<DIV class="col-sm-5 col-xs-8"><A class=item-name href="/details/drugs/40759/sprintas-75mg">Sprintas (75mg)</A>
<DIV class=text-small>28 Tablets</DIV>
<DIV class="item-manufacturer visible-xs">Intas Laboratories Pvt Ltd</DIV></DIV>
<DIV class="col-sm-5 hidden-xs"><SPAN class=item-manufacturer>Intas Laboratories Pvt Ltd</SPAN></DIV>
<DIV class="col-sm-2 col-xs-4 text-right">
<DIV class=item-actual>Rs. 5.72</DIV>
<DIV class=item-price>Rs. 5.72</DIV></DIV></DIV></LI>
<LI class="list-item item js-drug">
渲染输出如下所示:
Spasmonil (20mg) - Cipla Limited - Rs. 6
Sprintas (75mg) - Intas Laboratories Pvt - Rs. 5.72
我是以非常有效的方式进行的,我在不同的txt文件中得到4个输出(drugname,drugsquan,drugspric,drugsmanu),然后我手动组合它。有人可以帮助我以优雅的方式做到这一点。
$regex1 = 'item-name.*?>(.*?)</A>'
$regex2 = 'text-small>(.*?)</DIV>'
$regex3 ='"item-manufacturer visible-xs">(.*?)</DIV>'
$regex4 ='item-actual>(.*?)</DIV>'
$drugsname = $ie.Document.body.outerHTML -split "`r`n" |
ForEach-Object{
If($_ -match $regex1){
$matches[1]
}
}
$drugsquan = $ie.Document.body.outerHTML -split "`r`n" |
ForEach-Object{
If($_ -match $regex2){
$matches[1]
}
}
$drugsmanu = $ie.Document.body.outerHTML -split "`r`n" |
ForEach-Object{
If($_ -match $regex3){
$matches[1]
}
}
$drugspric = $ie.Document.body.outerHTML -split "`r`n" |
ForEach-Object{
If($_ -match $regex4){
$matches[1]
}
}
$drugsname > "d:\users\desktop\HKD\($control)drugsname.txt"
$drugsquan > "d:\users\desktop\HKD\($control)drugsquan.txt"
$drugsmanu > "d:\users\desktop\HKD\($control)drugsmanu.txt"
$drugspric > "d:\users\desktop\HKD\($control)drugspric.txt"
答案 0 :(得分:2)
在here-string中使用多行/单行正则表达式(又名“罐中的巨型虾”):
$data =
@'
<DIV class=row>
<DIV class="col-sm-5 col-xs-8"><A class=item-name href="/details/drugs/39467/spasmonil-20mg">Spasmonil (20mg)</A>
<DIV class=text-small>2 ml</DIV>
<DIV class="item-manufacturer visible-xs">Cipla Limited</DIV></DIV>
<DIV class="col-sm-5 hidden-xs"><SPAN class=item-manufacturer>Cipla Limited</SPAN></DIV>
<DIV class="col-sm-2 col-xs-4 text-right">
<DIV class=item-actual>Rs. 6</DIV>
<DIV class=item-price>Rs. 6</DIV></DIV></DIV></LI>
<LI class="list-item item js-drug">
<DIV class=row>
<DIV class="col-sm-5 col-xs-8"><A class=item-name href="/details/drugs/40759/sprintas-75mg">Sprintas (75mg)</A>
<DIV class=text-small>28 Tablets</DIV>
<DIV class="item-manufacturer visible-xs">Intas Laboratories Pvt Ltd</DIV></DIV>
<DIV class="col-sm-5 hidden-xs"><SPAN class=item-manufacturer>Intas Laboratories Pvt Ltd</SPAN></DIV>
<DIV class="col-sm-2 col-xs-4 text-right">
<DIV class=item-actual>Rs. 5.72</DIV>
<DIV class=item-price>Rs. 5.72</DIV></DIV></DIV></LI>
<LI class="list-item item js-drug">
'@
[regex]$regex =
@'
(?ms).*?<DIV class=row>.*?
.+?item-name href=".+?>(.+?)</A>.*?
.+?text-small>(.+?)</DIV>.*?
.+?item-manufacturer.+?>(.+?)</DIV></DIV>.*?
.+?item-actual>(.+?)</DIV>
'@
$regex.Matches($data) |
foreach {
[PSCustomObject]@{
Name = $_.Groups[1].value
Quantity = $_.Groups[2].Value
Manufacturer = $_.Groups[3].Value
Price = $_.Groups[4].Value
}
}
Name Quantity Manufacturer Price
---- -------- ------------ -----
Spasmonil (20mg) 2 ml Cipla Limited Rs. 6
Sprintas (75mg) 28 Tablets Intas Laboratories Pvt Ltd Rs. 5.72
现在您有了一个对象集合,您可以对其进行排序,过滤,格式化和导出以满足您的需求。