我使用bash
脚本从网址获取值,并以 html标记形式返回值。输入:
<tr><td title='The name of the health check service.'>hc.name</td><td data-type='java.lang.String'>Replication Queue</td></tr>
<tr><td title='The persistence identifier of the service.'>service.pid</td><td data-type='java.lang.String'>com.adobe.granite.replication.hc.impl.ReplicationQueueHealthCheck</td></tr>
<tr><td title='The health check result'>ok</td><td data-type='java.lang.Boolean'>true</td></tr>
<tr><td title='The health check status'>status</td><td data-type='java.lang.String'>OK</td></tr>
<tr><td title='The elapsed time in miliseconds'>elapsedTime</td><td data-type='java.lang.Long'>18</td></tr>
<tr><td title='The date when the execution finished'>finishedAt</td><td data-type='java.util.Date'>2017-03-24T00:23:36+0530</td></tr>
<tr><td title='Indicates of the execution timed out'>timedOut</td><td data-type='java.lang.Boolean'>false</td></tr>
所需的输出应存储在一个变量中,其中的值来自上述代码中的<td>
标记:
values=( ["hc.name"]="Replication Queue" ["status"]="OK")
我尝试使用此sed
代码,但仅当多个<td></td>
代码位于不同的行时才有效。在这种情况下,多个<td></td>
位于同一行。
sed -n 's:.*<td>(.*)</td>.*:\1:p' sample.txt
该命令仅显示如下输入的结果:
<tr>
<td>ok</td>
<td>status</td>
</tr>
答案 0 :(得分:0)
我认为使用Perl正则表达式会有更好的运气,因为它们支持非贪婪的匹配。这是一个Perl单行程序,用于打印文件中的信息:
perl -ne 'm:.*?<td [^>]*>(.*?)</td>.*?<td [^>]*>(.*?)</td>:; print "[\"$1\"] = \"$2\"\n";' sample.txt
输出:
["hc.name"] = "Replication Queue"
["service.pid"] = "com.adobe.granite.replication.hc.impl.ReplicationQueueHealthCheck"
["ok"] = "true"
["status"] = "OK"
["elapsedTime"] = "18"
["finishedAt"] = "2017-03-24T00:23:36+0530"
["timedOut"] = "false"
这是一个也有效的sed调用,但这不太精确,因为它匹配除>
和<
之外的所有字符以接近非贪婪匹配,这在sed中不受支持。
sed -n 's:.*<td [^>]*>\([^<]*\)</td><td [^>]*>\([^<]*\)</td>.*:[\"\1\"] = \"\2\":p' sample.txt
输出:
["hc.name"] = "Replication Queue"
["service.pid"] = "com.adobe.granite.replication.hc.impl.ReplicationQueueHealthCheck"
["ok"] = "true"
["status"] = "OK"
["elapsedTime"] = "18"
["finishedAt"] = "2017-03-24T00:23:36+0530"
["timedOut"] = "false"
答案 1 :(得分:0)
sgrep
和sed
方法(比纯sed
更可靠):
sgrep -o'%r"' '">" __ "<"' sample.txt | sed 's/^/["/;s/""/"/;s/"/"]="/2'
输出:
["hc.name"]="Replication Queue"
["service.pid"]="com.adobe.granite.replication.hc.impl.ReplicationQueueHealthCheck"
["ok"]="true"
["status"]="OK"
["elapsedTime"]="18"
["finishedAt"]="2017-03-24T00:23:36+0530"
["timedOut"]="false"