我有一个文本文件,我必须从该文件中获取特定文本。应该使用什么命令来获取它?
e.g。文件文本如下:
<name>this is first line</name>
<name>this is second line</name>
<name>this is third line</name>
我必须只从这些标签中获取文字,即我需要“这是第一行”。
答案 0 :(得分:7)
假设它实际上是一个完整的xml文档,您可能(应该)更喜欢
xmllint -xpath '//name/text()' test.xml
或者如果你想要换行,你可以
xsltproc.exe trafo.xslt test.xml
与trafo.xslt一样
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="html" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="/">
<xsl:for-each select="//name[text()]">
<xsl:if test="text()">
<xsl:value-of select="text()"/>
<xsl:text>
</xsl:text>
</xsl:if>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
答案 1 :(得分:4)
Sehe的答案不会在行间添加换行符。我建议改用以下内容:
xmlstarlet sel -t -m '//name/text()' -v '.' -n test.xml
# ^^^^^^^^^^^^^^^^^^^^^ ^^^^^^ ^^^
# for each xpath match | |
# print the result |
# followed by a newline
或
xmlstarlet sel -t -m '//name' -v 'text()' -n test.xml
# ^^^^^^^^^^^^^ ^^^^^^^^^^^ ^^^
# for each name tag | |
# print the text that's inside it |
# followed by a newline
(他们在打印换行符的位置上的表现略有不同)
答案 2 :(得分:1)
我相信您需要每个标记<name>
标记1行内的所有文字。
grep -Po "(?<=<name>)[^<]*(?=</name>)" yourfile
结果将是
this is first line
this is second line
this is third line
答案 3 :(得分:0)
答案 4 :(得分:0)
红宝石(1.9 +)
$ ruby -ne 'puts $_.scan(/<name>(.*?)<\/name>/)' file
this is first line
this is second line
this is third line
AWK
$ awk 'BEGIN{ RS="</name>" }/<name>/{ gsub(/.*<name>/,"");print }' file
this is first line
this is second line
this is third line
SED
$ sed -r 's|<name>(.[^>]*)</name>|\1|' file
this is first line
this is second line
this is third line
答案 5 :(得分:-1)
这对你有用吗? (不确定理解你的需要):
cat yourfile | grep "this is first line"