egrep找到一个至少有两次相同单词的行

时间:2016-02-12 22:40:55

标签: regex shell grep

如何使用正则表达式查找至少两次相同单词的行?

我试过了:

egrep '\w{2,}\1' file  

但终端给了我错误:

  

egrep:无效的反向引用号

2 个答案:

答案 0 :(得分:3)

目前的正则表达式有几个问题。

  1. 使用capturing group来捕获字词,并使用backreference
  2. 添加\b word boundaries以限制左侧和右侧的字词。
  3. 添加.*以匹配any amount之间的any characters
  4. echo "ABC foo ABC bar" | egrep '\b(\w{2,})\b.*\b\1\b'
    
      

    ABC foo ABC吧

    echo "ABC foo ABCD bar" | egrep '\b(\w{2,})\b.*\b\1\b'
    
      

    false

    See demo at regex101。如果需要,请使用egrep -o - 仅匹配来提取相关部分 您可以使用.*? -P - perl-regexp 进一步使用(function () { var v = "1.3.2"; if (window.jQuery === undefined || window.jQuery.fn.jquery < v) { var done = false; var script = document.createElement("script"); script.src = "http://ajax.googleapis.com/ajax/libs/jquery/" + v + "/jquery.min.js"; script.onload = script.onreadystatechange = function () { if (!done && (!this.readyState || this.readyState == "loaded" || this.readyState == "complete")) { done = true; initMyBookmarklet(); } }; document.getElementsByTagName("head")[0].appendChild(script); } else { initMyBookmarklet(); } function initMyBookmarklet() { (window.myBookmarklet = function () { function getSelText() { var s = ''; if (window.getSelection) { s = window.getSelection(); } else if (document.getSelection) { s = document.getSelection(); } else if (document.selection) { s = document.selection.createRange().text; } return s; } if ($("#wikiframe").length == 0) { var s = ""; s = getSelText(); if (s == "") { var s = prompt("Forget something?"); } if ((s != "") && (s != null)) { $("body").append("<div id='wikiframe'> <div id='wikiframe_veil' style=''> <p>Loading...</p> </div> <iframe src='http://en.wikipedia.org/w/index.php?&search=" + s + "' onload=" $('#wikiframe iframe').slideDown(500); ">Enable iFrames.</iframe> <style type='text/css'> #wikiframe_veil { display: none; position: fixed; width: 100%; height: 100%; top: 0; left: 0; background-color: rgba(255,255,255,.25); cursor: pointer; z-index: 900; } #wikiframe_veil p { color: black; font: normal normal bold 20px/20px Helvetica, sans-serif; position: absolute; top: 50%; left: 50%; width: 10em; margin: -10px auto 0 -5em; text-align: center; } #wikiframe iframe { display: none; position: fixed; top: 10%; left: 10%; width: 80%; height: 80%; z-index: 999; border: 10px solid rgba(0,0,0,.5); margin: -5px 0 0 -5px; } </style> </div> "); //the append tag ends here, but the browser is showing syntax error on appending multiple HTML codes $("#wikiframe_veil").fadeIn(750); } } else { $("#wikiframe_veil").fadeOut(750); $("#wikiframe iframe").slideUp(500); setTimeout("$('#wikiframe').remove()", 750); } $("#wikiframe_veil").click(function (event) { $("#wikiframe_veil").fadeOut(750); $("#wikiframe iframe").slideUp(500); setTimeout("$('#wikiframe').remove()", 750); }); })(); } })(); lazy点。

答案 1 :(得分:1)

请改为尝试:

egrep '(\w{2,}).*\1' file

如果您没有捕获组((...)),则没有任何反向引用。

以下是一个例子:

$ cat file
this line has the same word twice word
this line does not
this is this and that is that

$ egrep '(\w{2,}).*\1' file
this line has the same word twice word
this is this and that is that