Question

我有一些播放列表文件的行，并且只想提取文件名并打印出每行：

<location>file:///mnt/c3/jtvtes/ww/adw.avi</location>
<location>file:///mnt/c2/clown.mp4</location>
<location>file:///mnt/c2/jtv/video/ww/god.mp3</location>

从这些方面我只需要：

adw.avi
clown.mp4
god.mp3

所以我尝试在“/”和“＆lt;”之间提取文本字符：

sed -r 's/^(.*)pat1(.*)pat2(.*)$/\2/g'

修改为：

sed -r 's/^(.*)/(.*)<(.*)$/\2/g'

但这不起作用，有人有想法/解决方案吗？

Answer 1

一种方式：

sed -r 's|.*/(.*)</.*|\1|' file

Answer 2

当我想提取时，grep首先出现。

试试这一行：

grep -Po "(?<=/)[^/]*(?=<)" file

Answer 3

当它是XML时，首先执行该操作，以避免错误：

lxprintf -e location "%s\n" . yourfilename | awk -F '{print $NF}'

这可以保证您逐行获取文本内容。 lxprintf是http://www.ltg.ed.ac.uk/software/ltxml2的LTxml2工具包的一部分。然后awk为您提供最后一个斜杠分隔的标记。

如果您需要将其嵌入到生产工作流程中，常规实用程序不太容易访问或接受，请使用XSLT2：

<xsl:template match="location">
  <xsl:value-of select="tokenize(.,'/')[position()=last()]"/>
  <xsl:text>&#xa;</xsl:text>
</xsl:template>

sed在两个特定字符之间打印文本

3 个答案: