通过unix脚本/命令从xml文件中提取xml标记

时间:2012-07-25 08:22:42

标签: shell unix

这是我的示例文件:

<?xml version="1.0" encoding="UTF-8" ?>
 <testjar>
 <testable>
  <trigger>Trigger1</trigger>
  <message->2012-06-14T00:03.54</message>
 <sales-info>
  <san-a>no</san-a>
  <san-b>no</san-b>
  <san-c>no</san-c>
  </sales-info>
  </testable>
  </testjar>

我需要从this-

中提取xml标签

e.g。上述文件的输出应为

testjar
testable
trigger
message
sales-info
....

2 个答案:

答案 0 :(得分:3)

$> cat ./text
<?xml version="1.0" encoding="UTF-8" ?>
 <testjar>
 <testable>
  <trigger>Trigger1</trigger>
  <message>2012-06-14T00:03.54</message>
 <sales-info>
  <san-a>no</san-a>
  <san-b>no</san-b>
  <san-c>no</san-c>
  </sales-info>
  </testable>
  </testjar>

$> grep -P -o "(?<=\<)[^>?/]*(?=\>)" ./text
testjar
testable
trigger
message
sales-info
san-a
san-b
san-c 

正则表达式(?<=\<)[^>?/]*(?=\>)由3部分组成:

  • (?<=\<)(?<=)是lookbehind运算符,因此它意味着“在&lt;”之后;

  • [^>?/]*:不是>?/ 0次或更多次;

  • (?=\>)(?=)是超前运算符,因此它意味着“在&gt;之前”

答案 1 :(得分:0)

awk -F">" '{print $1}' xmlfile | sed -e '/<\//d' -e '/<?/d' -e 's/<//g'