我正在尝试学习如何使用XMLStarlet访问Bash中的HTML标记内容。例如,我试图访问www.wisdomofchopra.com/iframe.php页面中的某些文本。我在指定"地址"时遇到了一些困难。 XMLStarlet的HTML中的内容,并将重视一些帮助。我的代码尝试如下:
URL="http://www.wisdomofchopra.com/iframe.php"
webPage="$(curl -s "${URL}")"
echo "${webPage}" | xmlstarlet sel -T -t -c "//html/body//table/tr/td[@id='quote']/header/h2/"
这会产生以下输出:
-:29.12: Opening and ending tag mismatch: meta line 5 and head
</head>
^
-:35.100: Entity 'nbsp' not defined
te"><header><h2>"Emotional intelligence is beyond total reality"
^
-:35.106: Entity 'nbsp' not defined
eader><h2>"Emotional intelligence is beyond total reality"
^
-:41.119: EntityRef: expecting ';'
witter.com/intent/tweet?original_referer=http%3A%2F%2Fwww.wisdomofchopra.com&via
^
-:41.139: EntityRef: expecting ';'
eet?original_referer=http%3A%2F%2Fwww.wisdomofchopra.com&via=WisdomOfChopra&text
^
-:41.196: EntityRef: expecting ';'
via=WisdomOfChopra&text=%27Emotional+intelligence+is+beyond+total+reality%27&url
^
-:52.169: EntityRef: expecting ';'
));document.write(' src="http://ads.adbrite.com/mb/text_group.php?sid=2171164&zs
^
-:52.186: EntityRef: expecting ';'
(' src="http://ads.adbrite.com/mb/text_group.php?sid=2171164&zs=3436385f3630&ifr
^
-:52.209: EntityRef: expecting ';'
ite.com/mb/text_group.php?sid=2171164&zs=3436385f3630&ifr='+AdBrite_Iframe+'&ref
^
-:53.99: EntityRef: expecting ';'
p" href="http://www.adbrite.com/mb/commerce/purchase_form.php?opid=2171164&afsid
^
-:57.9: Opening and ending tag mismatch: head line 3 and html
</html>
^
-:58.1: Premature end of data in tag html line 2
编辑:为方便起见,下面是网页的一些大致相当的HTML代码:
<!DOCTYPE html>
<html>
<head>
</head>
<body>
<h3>Your random fictional Deepak Chopra quote:</h3>
<table border="0" cellspacing="0" cellpadding="0">
<tr>
<td width="128" align="left" valign="top"><img src="img/imageSmall2.png" width="80" height="80" /></td>
<td id="quote"><header><h2>"Perceptual reality serves total truth" </h2></header></td>
</tr>
</table>
</body>
</html>
答案 0 :(得分:0)
我无法使用XMLStarlet来处理HTML,所以我只是使用grep和AWK来完成它:
printDeepakChopraAdvice(){
URL="http://www.wisdomofchopra.com/iframe.php"
webPage="$(curl -s "${URL}")"
text="$(echo "${webPage}" | grep "id=\"quote\"" | awk -F""" '{print $2}')"
echo "${text}"
}