Question

在CSV文件中，有与此类似的行：

<iframe src="https://player.vimeo.com/video/30342373" width="640" height="364" frameborder="0" webkitallowfullscreen mozallowfullscreen allowfullscreen></iframe>

我想从30342373和vimeo.com/video/之间的这些行中提取"。我在mawk中尝试了以下正则表达式：

vimeo\.com\/video\/[^"]*

正在捕获：vimeo.com/video/30342373

如果我知道的话，mawk仅支持POSIX ERE语法，类似于egrep。

如何从行中仅捕获唯一的视频ID部分？

Answer 1

sed：

更容易

str='<iframe src="https://player.vimeo.com/video/30342373" width="640" height="364" frameborder="0" webkitallowfullscreen mozallowfullscreen allowfullscreen></iframe>'

sed 's~.*\.vimeo\.com/video/~~; s~" .*~~' <<< "$str"

30342373

此sed首先删除从开始到vimeo.com/video/的所有内容，然后删除从"到结尾的所有内容，从而为我们留下唯一的ID。

Answer 2

$ awk '{gsub(/.*vimeo.com\/video\/|".*/,"")}1' file
30342373

POSIX ERE中的正向后视或非捕获组（扩展正则表达式）

2 个答案: