Question

我在字典中有一堆搜索字符串用于解析信息。

my_func_dict = {
    'index_one': r'pattern1'
    'index_two': r'pattern2'
    etc
}

然后我使用以下内容来捕获评估和应用我的搜索字符串的路径，该字符串工作正常。

if len(sys.argv) >= 2:
    location = sys.argv[1]
else:
    location = raw_input("Enter the path to evaluate...>: ")

然后，我迭代字典项以应用搜索命令：

search_cmd = 'grep -h -r'.split()
for name, pattern in my_func_dict.items():
    with open('{}.txt'.format(name), 'a') as output:
        cmd = search_cmd + [pattern, location]
        subprocess.call(cmd, stdout=output)

这适用于少数搜索模式和很少要评估的文件。但在我的情况下，我有很多搜索模式，并将这些模式应用于包含多个文件的文件夹，其中包括几种扩展类型：* .txt，* log等，这需要很长时间。我想使用find选项首先查看文件夹路径中的特定文件类型，然后更精确地应用grep以便更快地获得输出结果。

但是在尝试之后：

search_cmd = 'find $location -name "*test.txt" -print0 | xargs -0 grep -h -r'.split()
for name, pattern in my_func_dict.items():
    with open('{}.txt'.format(name), 'a') as output:
        cmd = search_cmd + [pattern, location]
        subprocess.call(cmd, stdout=output)

给我一个错误：

find: |: unknown primary or operator
find: |: unknown primary or operator
find: |: unknown primary or operator
find: |: unknown primary or operator

如何实施search_cmd以避免此问题？我需要使用-print0和xargs -0作为find的属性，因为路径中的文件夹名称包含空格，例如：/This is the path/for/This Folder。感谢

Answer 1

您可以使用带有 Popen 的shell=True将完整字符串与子进程一起使用。我们也可以使用Python在新行上拆分输出。

import subprocess

mydict = {'.': 'patte', './': '".atty"'}

results = []
for path, pattern in mydict.items():
    cmd = 'find ' + path + ' -type f -name "*.txt" | xargs fgrep -h --basic-regex ' + pattern
    sp = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
    response = sp.communicate()
    response = [x.decode('utf-8').strip().split('\n') for x in response if x]
    if response:
        response = response[0] 
        results.append(response)

结果

[['pattern1', 'pattern2'], ['pattycakes', 'patty']]

如何实现find命令以在Python中使用sys.argv中的输入

1 个答案: