Question

我有一个文本文件，我必须从该文件中获取特定文本。应该使用什么命令来获取它？

e.g。文件文本如下：

<name>this is first line</name>
<name>this is second line</name>
<name>this is third line</name>

我必须只从这些标签中获取文字，即我需要“这是第一行”。

Answer 1

假设它实际上是一个完整的xml文档，您可能（应该）更喜欢

xmllint -xpath '//name/text()' test.xml

或者如果你想要换行，你可以

xsltproc.exe trafo.xslt test.xml

与trafo.xslt一样

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="html" indent="yes"/>
    <xsl:strip-space elements="*"/>
    <xsl:template match="/">
        <xsl:for-each select="//name[text()]">
            <xsl:if test="text()">
                <xsl:value-of select="text()"/>
                <xsl:text>&#x0a;</xsl:text>
            </xsl:if>
        </xsl:for-each>
    </xsl:template>
</xsl:stylesheet>

Answer 2

Sehe的答案不会在行间添加换行符。我建议改用以下内容：

xmlstarlet sel -t -m '//name/text()' -v '.' -n  test.xml
#              ^^^^^^^^^^^^^^^^^^^^^ ^^^^^^ ^^^
#              for each xpath match    |     |
#                          print the result  |
#                         followed by a newline

或

xmlstarlet sel -t -m '//name' -v 'text()' -n  test.xml
#               ^^^^^^^^^^^^^ ^^^^^^^^^^^ ^^^
#          for each name tag       |       |
#    print the text that's inside it       |
#                         followed by a newline

（他们在打印换行符的位置上的表现略有不同）

Answer 3

我相信您需要每个标记<name>标记1行内的所有文字。

grep -Po "(?<=<name>)[^<]*(?=</name>)" yourfile

结果将是

this is first line
this is second line
this is third line

Answer 4

grep将帮助您找到合适的线条。如果定期格式化，也许您可以使用cut删除<name>代码？如果不是，那么sed可能是正确的工具。

Answer 5

红宝石（1.9 +）

$ ruby -ne 'puts $_.scan(/<name>(.*?)<\/name>/)' file
this is first line
this is second line
this is third line

AWK

$ awk 'BEGIN{ RS="</name>" }/<name>/{ gsub(/.*<name>/,"");print }' file
this is first line
this is second line
this is third line

SED

$ sed -r 's|<name>(.[^>]*)</name>|\1|' file
this is first line
this is second line
this is third line

Answer 6

这对你有用吗？（不确定理解你的需要）：

cat yourfile | grep "this is first line"

应该用什么命令从linux上的XML文件行获取文本？

6 个答案: