彼此靠近两个字

时间:2011-03-06 12:49:24

标签: regex grep

假设我在文件中有一行“这可能是添加新功能最简单的地方。”我想要彼此接近两个字。我做了

grep -ERHn "\beasiest\W+(?:\w+\W+){1,6}?place\b" *

有效并给了我一条线。但是当我做的时候

grep -ERHn "\beasiest\W+(?:\w+\W+){1,10}?new\b" *

它失败了,击败了 {1,10}的全部内容? 这个列在regular-expression.info网站和几本Regex书籍中。虽然他们没有用grep描述它,但这不重要。

更新

我将正则表达式放入python脚本中。工作,但没有很好的grep -C事情......

#!/usr/bin/python
import re
import sys
import os

word1 = sys.argv[1]
word2 = sys.argv[2]
dist = sys.argv[3]
regex_string = (r'\b(?:' 
    + word1  
    + r'\W+(?:\w+\W+){0,'
    + dist
    + '}?'
    + word2 
    + r'|'
    + word2
    + r'\W+(?:\w+\W+){0,'
            + dist
    + '}?'
    + word1
    + r')\b')

regex = re.compile(regex_string)


def findmatches(PATH):
for root, dirs, files in os.walk(PATH):
    for filename in files:
        fullpath = os.path.join(root,filename)

        with open(fullpath, 'r') as f:
            matches = re.findall(regex, f.read())
            for m in matches:
                print "File:",fullpath,"\n\t",m

if __name__ == "__main__":  
    findmatches(sys.argv[4])    

将其称为

python near.py charlie winning 6 path/to/charlie/sheen

适合我。

2 个答案:

答案 0 :(得分:1)

你真的需要前瞻性的结构吗? 也许这就够了:

grep -ERHn "\beasiest\W+(\w+\W+){1,10}new\b" * 

这是我得到的:

echo "This is perhaps the easiest place to add new functionality." | grep -EHn "\beasiest\W+(\w+\W+){1,10}new\b"
  

(标准输入):1:这可能是添加新内容的最简单的地方   功能。

修改

正如Camille Goudeseune所说:

为了使其易于使用,可以在.bashrc中添加:

grepNear() {
 grep -EHn "\b$1\W+(\w+\W+){1,10}$2\b"
}.

然后在bash提示符下:echo "..." | grepNear easiest new

答案 1 :(得分:0)

grep不支持Python正则表达式的非捕获组。当您撰写(?:\w+\W+)之类的内容时,您要求grep匹配问号?,后跟冒号:,后跟一个或多个字词\w+然后是一个或多个非单词字符\W+?grep正则表达式的特殊字符,当然,因为它跟随组的开头,它会自动转义(与正则表达式[?]匹配的方式相同)问号)。

让我们测试一下?我有以下文件:

$ cat file
This is perhaps the easiest place to add new functionality.

grep与您使用的表达式不匹配:

$ grep -ERHn "\beasiest\W+(?:\w+\W+){1,10}?new\b" file

然后,我创建了以下文件:

$ cat file2
This is perhaps the easiest ?:place ?:to ?:add new functionality.

请注意,每个单词前面都有?:。在这种情况下,表达式匹配文件:

$ grep -ERHn "\beasiest\W+(?:\w+\W+){1,10}?new\b" file2
file2:1:This is perhaps the easiest ?:place ?:to ?:add new functionality.

解决方案是删除表达式的?:

$ grep -ERHn "\beasiest\W+(\w+\W+){1,10}?new\b" file
file:1:This is perhaps the easiest place to add new functionality.

由于你甚至不需要非捕获组(至少就我所见),它没有任何问题。

加分点:您可以简化表达式,将{1,10}更改为{0,10}并删除以下?

$ grep -ERHn "\beasiest\W+(\w+\W+){0,10}new\b" file
file:1:This is perhaps the easiest place to add new functionality.