我的文件urls-list.html
里面有多个网址路径,格式为:
<body contenteditable="true">
<h1>File: <a href="https://test.com/Config.js" target="_blank" rel="nofollow noopener noreferrer">https://test.com/Config.js</a></h1>
<div>
<a href='/common/assets/locale/language_en.props' class='text'>/common/assets/locale/language_en.props</a>
<div class='container'> urls: [e.get("app.content.domain") + "<span style='background-color:yellow'>/common/assets/locale/language_en.props</span>"]</div>
</div>
<div>
<a href='/common/assets/locale/language_en1.props' class='text'>/common/assets/locale/language_en1.props</a>
<div class='container'> remote: a + n + brandSuffix + "<span style='background-color:yellow'>>/common/assets/locale/language_en1.props</span>",</div>
</div>
<div>
<a href='/common/assets/locale/language_en2.props' class='text'>/common/assets/locale/language_en2.props</a>
<div class='container'> remote: a + n + "<span style='background-color:yellow'>>/common/assets/locale/language_en2.props</span>",</div>
</div>
<div>
<a href='/common/assets/locale/language_en2.props' class='text'>/common/assets/locale/language_en2.props</a>
<div class='container'> remote: a + n + "<span style='background-color:yellow'>>/common/assets/locale/language_en3.props</span>",</div>
</div>
<div>
<a href='/common/assets/locale/language_en3.props' class='text'>/common/assets/locale/language_en3.props</a>
<div class='container'> remote: a + n + "<span style='background-color:yellow'>>/common/assets/locale/language_en4.props</span>",</div>
</div>
<div>
<a href='/main' class='text'>/main</a>
<div class='container'> versionedAssets.isEnabled() && (i = versionedAssets.getJSAsset("dashboard/boot"), r = versionedAssets.getJSAsset("dashboard<span style='background-color:yellow'>/main</span>"), l = versionedAssets.getJSAsset("appkit-utilities<span style='background-color:yellow'>/main</span>"), hybrid && (i = versionedAssets.getHybridAsset("dashboard/boot"), r = versionedAssets.getHybridAsset("dashboard<span style='background-color:yellow'>/main</span>"))), envProps.get("app.blueJSVersion.enabled") ? (n.push([envProps.get("app.blueVendor.version") + "<span style='background-color:yellow'>/main</span>", envProps.get("app.blue.version") + "<span style='background-color:yellow'>/main</span>", envProps.get("app.blueApp.version") + "<span style='background-color:yellow'>/main</span>", envProps.get("app.blueView.version") + "<span style='background-color:yellow'>/main</span>", "blue-ui/dist/blue-ui/js<span style='background-color:yellow'>/main</span>", l, i, r]), n.push([{</div>
</div>
我希望帮助从文件urls-list.html中提取显示在span
标记内的所有网址路径。
为了更清楚,我需要这个输出:
Command: ./extra-path.sh urls-list.html (or simialr)
result:
/common/assets/locale/language_en.props
/common/assets/locale/language_en1.props
/common/assets/locale/language_en2.props
/main
任何人都可以帮我吗?
更新:我只需要黄色的网址路径。 (背景色:黄)
答案 0 :(得分:1)
以下内容对您有帮助。
cat script.ksh
awk '/span/ && match($0,/<span style=\047background-color:yellow\047>>[^<]*/){print substr($0,RSTART+39,RLENGTH-39)}' "$1"
现在也添加非单线形式的解决方案。
cat script.ksh
awk '
/span/ && match($0,/<span style=\047background-color:yellow\047>>[^<]*/){
print substr($0,RSTART+39,RLENGTH-39)
}' "$1"
答案 1 :(得分:0)
尝试以下代码
var href = window.location.href;
var dir = href.substring(0, href.lastIndexOf('/')) + "/";
答案 2 :(得分:0)
您可以使用awk:
在bash中执行此操作awk -F'[ =]' '/href/ {print $3}' urls-list.html
说明:
-F 告诉awk使用空格和&#39; =&#39;作为分隔符
/ href / 使打印命令在包含&#34; / href /&#34;的每一行上运行;
print $ 3 打印第三个标记
但是,只有当输入行格式与示例中的输入行格式完全相同时,这才有效。更强大的是:
awk -F'href=' '/href/ {print $2}' urls-list.html | awk -F'[ <>]' '{print $1}'