Perl模式匹配问题

时间:2010-12-01 23:26:57

标签: regex perl pattern-matching

我正在尝试匹配perl中的模式并需要一些帮助。

我需要从字符串中删除任何与[xxxx]相匹配的内容,即在其中打开括号内的东西 - 首先出现的右括号。

所以我试图用空格替换开口括号,里面的东西,首先用以下代码关闭括号:

   if($_ =~ /[/)
  {
    print "In here!\n";
    $_ =~ s/[(.*?)]/ /ig;
  }

同样地,我需要匹配它内部的角括号 - 首先关闭角括号。

我正在使用以下代码执行此操作:

   if($_ =~ /</)
  {
    print "In here!\n";
    $_ =~ s/<(.*?)>/ /ig;
  }

这一些似乎没有用。我的样本数据如下:

 'Joanne' <!--Her name does NOT contain "Kathleen"; see the section "Name"--> "'Jo'" 'Rowling', OBE [http://news bbc co uk/1/hi/uk/793844 stm Caine heads birthday honours list]  BBC News  17 June 2000  Retrieved 25 October 2000  , [http://content scholastic com/browse/contributor jsp?id=3578 JK Rowling Biography]  Scholastic com  Retrieved 20 October 2007  better known as 'J  K  Rowling' ,<ref name=telegraph>[http://www telegraph co uk/news/uknews/1531779/BBCs-secret-guide-to-avoid-tripping-over-your-tongue html Daily Telegraph, BBC's secret guide to avoid tripping over your tongue, 19 October 2006] is a British <!--do not change to "English" or "Scottish" until issue is resolved --> author best known as the creator of the [[Harry Potter]] fantasy series, the idea for which was conceived whilst on a train trip from Manchester to London in 1990  The Potter books have gained worldwide attention, won multiple awards, sold more than 400 million copies and been the basis for a popular series of films, in which Rowling had creative control serving as a producer in two of the seven installments  [http://www businesswire com/news/home/20100920005538/en/Warner-Bros -Pictures-Worldwide-Satellite-Trailer-Debut%C2%A0Harry Business Wire - Warner Bros  Pictures mentions J  K  Rowling as producer ] 

任何帮助将不胜感激。谢谢!

3 个答案:

答案 0 :(得分:2)

你需要使用它:

1 while s/\[[^\[\]]*\];

演示:

% echo "i have [some [square] brackets] in [here] and [here] today."| perl -pe '1 while s/\[[^\[\]]*\]/NADA/g'
i have NADA in NADA and NADA today.

与失败的对比:

% echo "i have [some [square] brackets] in [here] and [here] today." | perl -pe 's/\[.*?\]/NADA/g'
i have NADA brackets] in NADA and NADA today.

我留下的递归正则表达式作为读者的练习。 :)


编辑: Eric Strom提供了一个递归解决方案,您不必使用1 while

% echo "i have [some [square] brackets] in [here] and [here] today." | perl -pe 's/\[(?:[^\[\]]*|(?R))*\]/NADA/g'
i have NADA in NADA and NADA today.

答案 1 :(得分:1)

  • 方括号在正则表达式语法中具有特殊含义,因此请转义它们:/\[.*?\]/。 (你也不需要这里的括号,做不区分大小写的匹配是没有意义的。)

  • 自从我不得不与Perl搏斗已经很长时间了,但我很确定用正则表达式测试$ _也会修改$ _(即使你没有使用s ///) 。你无论如何都不需要测试;只需运行替换,如果模式在任何地方都不匹配,那么它就不会做任何事情。

答案 2 :(得分:1)

$_ =~ /someregex/不会修改$_

只需注意,$_ =~ /someregex//someregex/也会做同样的事情。

此外,您不需要检查是否存在[或&lt;或分组括号:

s/\[.*?\]/ /g;

s/<.*?>/ /g;

将完成你想要的工作。

编辑:更改了代码以匹配您正在修改$ _

的事实