awk - 如何改进正则表达式?

时间:2012-09-30 10:25:24

标签: awk

我有一个文件:

@Book{gjn2011ske, 
  author =   {Grzegorz J. Nalepa},
  title =    {Semantic Knowledge Engineering. A Rule-Based Approach},
  publisher =    {Wydawnictwa AGH},
  year =     2011,
  address =  {Krak\'ow}
}

@article{gjn2010jucs,
  Author =   {Grzegorz J. Nalepa},
  Journal =  {Journal of Universal Computer Science},
  Number =   7,
  Pages =    {1006-1023},
  Title =    {Collective Knowledge Engineering with Semantic Wikis},
  Volume =   16,
  Year =     2010
}

我想改进只删除第一行的正则表达式。 注意:无法更改记录分隔符RS="}\n"

我试过了:

awk 'BEGIN{ RS="}\n" } {gsub(/@.*,/,"") ; print }' file

我想打印结果:

  author =   {Grzegorz J. Nalepa},
  title =    {Semantic Knowledge Engineering. A Rule-Based Approach},
  publisher =    {Wydawnictwa AGH},
  year =     2011,
  address =  {Krak\'ow}

  Author =   {Grzegorz J. Nalepa},
  Journal =  {Journal of Universal Computer Science},
  Number =   7,
  Pages =    {1006-1023},
  Title =    {Collective Knowledge Engineering with Semantic Wikis},
  Volume =   16,
  Year =     2010

感谢您的帮助。

修改

我建议的解决方案:

awk 'BEGIN{ RS="}\n" }{sub(",","@"); sub(/@.*@/,""); print }' file 

4 个答案:

答案 0 :(得分:2)

使用指定的RS设置很难完成您想要的任务(因为address = {Krak\'ow}有一个额外的记录结束)。我宁愿选择:

awk '$0 !~ "^@" && $0 !~ "^} *$" { print }' FILE 

in action here

编辑我不知道为什么它必须使用正则表达式解决方案,你能解释一下吗?

无论如何,还有另一个(working, see here)解决方案使用正则表达式,而不是你期望的解决方案。:

awk 'BEGIN{ RS="}\n" }
{
  split($0,a,"\n")
  for (e=1;e<=length(a);e++) {
      if (a[e] ~ "{" && a[e] !~ "}") {
          sub("$","}",a[e])
      }
      if (a[e] ~ "=") { print a[e] }
  }
  printf("\n")
}' INPUTFILE

还有一个更简单的正则表达式,但它失败了,最后address的“}”行将被RS删除,并且会打印出来最后} ...

awk 'BEGIN{ RS="}\n" }
{
  sub("@[^,]\+,","")
  print $0
}' INPUTFILE

答案 1 :(得分:2)

不使用正则表达式的一种方法。将字段分隔符设置为换行符,现在寄存器的每个键都是一个字段。然后,遍历每个字段并打印那些不以@开头的字段:

awk '
    BEGIN { 
        RS="}\n"; 
        FS=OFS="\n"; 
    } 
    { 
        for (i=1; i<=NF; i++) { 
            if ( substr($i, 1, 1) != "@" ) { 
                printf "%s%s", $i, (i == NF) ? RS : OFS; 
            } 
        } 
    }
' file

输出:

author =   {Grzegorz J. Nalepa},
title =    {Semantic Knowledge Engineering. A Rule-Based Approach},
publisher =    {Wydawnictwa AGH},
year =     2011,
address =  {Krak\'ow}

Author =   {Grzegorz J. Nalepa},
Journal =  {Journal of Universal Computer Science},
Number =   7,
Pages =    {1006-1023},
Title =    {Collective Knowledge Engineering with Semantic Wikis},
Volume =   16,
Year =     2010

答案 2 :(得分:2)

我会使用GNU sed来执行此操作:

sed '/^@/,/^}$/ { //d }' file.txt

结果:

  author =   {Grzegorz J. Nalepa},
  title =    {Semantic Knowledge Engineering. A Rule-Based Approach},
  publisher =    {Wydawnictwa AGH},
  year =     2011,
  address =  {Krak\'ow}

  Author =   {Grzegorz J. Nalepa},
  Journal =  {Journal of Universal Computer Science},
  Number =   7,
  Pages =    {1006-1023},
  Title =    {Collective Knowledge Engineering with Semantic Wikis},
  Volume =   16,
  Year =     2010

请注意,您可以使用-i标志进行就地更改(即覆盖文件内容),并且可以使用-s标志对多个文件进行更改。例如:

sed -s -i '/^@/,/^}$/ { //d }' *.txt

答案 3 :(得分:1)

awk '{if($0!~/@/&&$0!~/^}/)print}' temp

测试如下:

> awk '{if($0!~/@/&&$0!~/^}/)print}' temp
  author =       {Grzegorz J. Nalepa},
  title =        {Semantic Knowledge Engineering. A Rule-Based Approach},
  publisher =    {Wydawnictwa AGH},
  year =         2011,
  address =      {Krak\'ow}

  Author =       {Grzegorz J. Nalepa},
  Journal =      {Journal of Universal Computer Science},
  Number =       7,
  Pages =        {1006-1023},
  Title =        {Collective Knowledge Engineering with Semantic Wikis},
  Volume =       16,
  Year =         2010
>