Question

我正在尝试为<cookie>＆amp;之间的数据过滤此代码</cookie>以及account-id="＆amp;之间的数据"（尾随引用）

<?xml version="1.0" encoding="utf-8"?>
<results>
 <status code="ok"/>
 <common locale="en" time-zone-id="85">
  <cookie>na3breezfxm5hk6co2kfzuxq</cookie>
  <date>2012-11-11T16:26:52.713+00:00</date>
  <host>http://meet97263421.adobeconnect.com</host>
  <local-host>pacna3app09</local-host>
  <admin-host>na3cps.adobeconnect.com</admin-host>
  <url>/api/xml?action=common-info</url>
  <version>8.2.2.0</version>
  <tos-version>7.5</tos-version>
  <product-notification>true</product-notification>
  <account account-id="1013353222"/>
  <user-agent>curl/7.24.0 (x86_64-apple-darwin12.0) libcurl/7.24.0 OpenSSL/0.9.8r zlib/1.2.5</user-agent>
 </common>
</results>

任何帮助都将不胜感激。

修改

这是我运行的curl命令，用于返回上面的xml。

curl -s http://meet97263421.adobeconnect.com/api/xml?action=common-info

Answer 1

一般情况下，正则表达式（以及grep）aren't well-suited to parsing XML，但如果您可以保证输入格式良好且一致，则可以使用grep的perl轻松完成此操作样式的正则表达式（在grep上有它们的系统上）：

grep -oP '(?<=<cookie>).*?(?=</cookie>)'
grep -oP '(?<=account-id=").*?(?=")'

如果你想在同一个命令中使用它们，你可以用|分隔它们，但是你必须告诉哪些匹配。

grep -oP '(?<=<cookie>).*?(?=</cookie>)|(?<=account-id=").*?(?=")'

Answer 2

如@Kevin所述，正则表达式不适合解析XML。

更好的方法是使用 xmllint 程序，该程序应用xpath表达式，如下所示：

$ xmllint --xpath "string(/results/common/cookie)" data.xml
na3breezfxm5hk6co2kfzuxq

$ xmllint --xpath "string(/results/common/account/@account-id)" data.xml
1013353222

Answer 3

使用这些XPath表达式

/results/common/cookie

/results/common/account/@account-id

使用命令行XPath解释器

解析从Curl中获取的特定值的XML

3 个答案: