Question

我正在尝试编写一个小脚本，该脚本可以使我的rss-reader（新闻快艇）中的文章“ org-capture”。所以我的情况是这样的：我将把文章发送到脚本中；但是，该文章只能通过一行传递，如下所示：

Title: ABC boss quits over Australian political interference claims Author: Date: Thu, 27 Sep 2018 09:39:16 +0200 Link: https://www.bbc.co.uk/news/world-australia-45661871 The broadcaster's chair quits amid allegations the government leaned on him to dismiss two journalists.

所以我要做的是将链接和标题始终存储在一个变量中，然后使用这些变量（emacsclient org-protocol：/ ...）调用命令

所以基本上我需要这个：

TITLE="ABC boss quits over Australian political interference claims"
URL="https://www.bbc.co.uk/news/world-australia-45661871"

我考虑过使用awk或sed，但它们在单独的行中效果最好。因此，我认为也许在“标题：”，“作者：”，“日期：”和“链接：”处分割一行，然后用awk / sed提取。

我在这里发现了类似的用例和问题，但并不完全相同。我想要一个非常小的脚本，而不必使用python。

我在正确的轨道上吗？

感谢您的帮助。

Answer 1

使用GNU awk将第三个参数匹配（）：

$ cat tst.awk
match($0,/^Title:\s*(.*)\s+Author:\s*(.*)\s+Date:\s*(.*)\s+Link:\s*(\S+)\s+(.*)/,a) {
    printf "TITLE=\"%s\"\n", a[1]
    printf "URL=\"%s\"\n", a[4]
}

$ awk -f tst.awk file
TITLE="ABC boss quits over Australian political interference claims"
URL="https://www.bbc.co.uk/news/world-australia-45661871"

我也展示了如何保存所有其他字段，以便您还可以使用输入内容执行其他任何操作。

Answer 2

这可能对您有用（GNU sed）：

sed -r 's/^Title: (.*) Author:.* Link: (\S+).*/TITLE="\1"\nURL="\2"/' file

使用模式匹配来提取必填字段。第一个可能包含空格，因此与键Author:匹配。第二个是键Link:后的一串非空格字符。

在Linux中提取两个字符串之间的字符串的脚本

2 个答案: