Question

我有一个文件：

@Book{gjn2011ske, 
  author =   {Grzegorz J. Nalepa},
  title =    {Semantic Knowledge Engineering. A Rule-Based Approach},
  publisher =    {Wydawnictwa AGH},
  year =     2011,
  address =  {Krak\'ow}
}

@article{gjn2010jucs,
  Author =   {Grzegorz J. Nalepa},
  Journal =  {Journal of Universal Computer Science},
  Number =   7,
  Pages =    {1006-1023},
  Title =    {Collective Knowledge Engineering with Semantic Wikis},
  Volume =   16,
  Year =     2010
}

我想改进只删除第一行的正则表达式。 注意：无法更改记录分隔符RS="}\n"。

我试过了：

awk 'BEGIN{ RS="}\n" } {gsub(/@.*,/,"") ; print }' file

我想打印结果：

  author =   {Grzegorz J. Nalepa},
  title =    {Semantic Knowledge Engineering. A Rule-Based Approach},
  publisher =    {Wydawnictwa AGH},
  year =     2011,
  address =  {Krak\'ow}

  Author =   {Grzegorz J. Nalepa},
  Journal =  {Journal of Universal Computer Science},
  Number =   7,
  Pages =    {1006-1023},
  Title =    {Collective Knowledge Engineering with Semantic Wikis},
  Volume =   16,
  Year =     2010

感谢您的帮助。

修改

我建议的解决方案：

awk 'BEGIN{ RS="}\n" }{sub(",","@"); sub(/@.*@/,""); print }' file

Answer 1

使用指定的RS设置很难完成您想要的任务（因为address = {Krak\'ow}有一个额外的记录结束）。我宁愿选择：

awk '$0 !~ "^@" && $0 !~ "^} *$" { print }' FILE

见in action here。

编辑我不知道为什么它必须使用正则表达式解决方案，你能解释一下吗？

无论如何，还有另一个（working, see here）解决方案使用正则表达式，而不是你期望的解决方案。：

awk 'BEGIN{ RS="}\n" }
{
  split($0,a,"\n")
  for (e=1;e<=length(a);e++) {
      if (a[e] ~ "{" && a[e] !~ "}") {
          sub("$","}",a[e])
      }
      if (a[e] ~ "=") { print a[e] }
  }
  printf("\n")
}' INPUTFILE

还有一个更简单的正则表达式，但它失败了，最后address的“}”行将被RS删除，并且会打印出来最后} ...

awk 'BEGIN{ RS="}\n" }
{
  sub("@[^,]\+,","")
  print $0
}' INPUTFILE

Answer 2

不使用正则表达式的一种方法。将字段分隔符设置为换行符，现在寄存器的每个键都是一个字段。然后，遍历每个字段并打印那些不以@开头的字段：

awk '
    BEGIN { 
        RS="}\n"; 
        FS=OFS="\n"; 
    } 
    { 
        for (i=1; i<=NF; i++) { 
            if ( substr($i, 1, 1) != "@" ) { 
                printf "%s%s", $i, (i == NF) ? RS : OFS; 
            } 
        } 
    }
' file

输出：

author =   {Grzegorz J. Nalepa},
title =    {Semantic Knowledge Engineering. A Rule-Based Approach},
publisher =    {Wydawnictwa AGH},
year =     2011,
address =  {Krak\'ow}

Author =   {Grzegorz J. Nalepa},
Journal =  {Journal of Universal Computer Science},
Number =   7,
Pages =    {1006-1023},
Title =    {Collective Knowledge Engineering with Semantic Wikis},
Volume =   16,
Year =     2010

Answer 3

我会使用GNU sed来执行此操作：

sed '/^@/,/^}$/ { //d }' file.txt

结果：

  author =   {Grzegorz J. Nalepa},
  title =    {Semantic Knowledge Engineering. A Rule-Based Approach},
  publisher =    {Wydawnictwa AGH},
  year =     2011,
  address =  {Krak\'ow}

  Author =   {Grzegorz J. Nalepa},
  Journal =  {Journal of Universal Computer Science},
  Number =   7,
  Pages =    {1006-1023},
  Title =    {Collective Knowledge Engineering with Semantic Wikis},
  Volume =   16,
  Year =     2010

请注意，您可以使用-i标志进行就地更改（即覆盖文件内容），并且可以使用-s标志对多个文件进行更改。例如：

sed -s -i '/^@/,/^}$/ { //d }' *.txt

Answer 4

awk '{if($0!~/@/&&$0!~/^}/)print}' temp

测试如下：

> awk '{if($0!~/@/&&$0!~/^}/)print}' temp
  author =       {Grzegorz J. Nalepa},
  title =        {Semantic Knowledge Engineering. A Rule-Based Approach},
  publisher =    {Wydawnictwa AGH},
  year =         2011,
  address =      {Krak\'ow}

  Author =       {Grzegorz J. Nalepa},
  Journal =      {Journal of Universal Computer Science},
  Number =       7,
  Pages =        {1006-1023},
  Title =        {Collective Knowledge Engineering with Semantic Wikis},
  Volume =       16,
  Year =         2010
>

awk - 如何改进正则表达式？

4 个答案: