从目录中删除其中没有特定单词的文本文件

时间:2018-05-12 16:30:33

标签: python os.walk

我有一个包含~2200个文本文件的目录。我需要删除任何不包含我定义的特定单词的文本文件。有人可以看一下这段代码并就如何使其运行提出建议吗?现在,当我运行它时,它说它找不到目录“C”。

另外,我想确保它针对该目录中的每个文件运行。我需要包含下一个功能吗?

import os

path = r'C:\Users\user\Desktop\AFL codes to test'
words = ['buy', 'sell']

for root, dirs, files in os.walk(path):
    for file in path:
        if not any(words in file for words in words):
            os.remove(file)

此外,这里是完整的追溯:

runfile('C:/Users/user/.spyder-py3/DELETE FILES THAT DONT CONTAIN CERTAIN WORDS.py', wdir='C:/Users/user/.spyder-py3')
Traceback (most recent call last):

  File "<ipython-input-23-dbc80e182b2b>", line 1, in <module>
    runfile('C:/Users/user/.spyder-py3/DELETE FILES THAT DONT CONTAIN CERTAIN WORDS.py', wdir='C:/Users/user/.spyder-py3')

  File "C:\Users\user\Anaconda31\lib\site-packages\spyder\utils\site\sitecustomize.py", line 705, in runfile
    execfile(filename, namespace)

  File "C:\Users\user\Anaconda31\lib\site-packages\spyder\utils\site\sitecustomize.py", line 102, in execfile
    exec(compile(f.read(), filename, 'exec'), namespace)

  File "C:/Users/user/.spyder-py3/DELETE FILES THAT DONT CONTAIN CERTAIN WORDS.py", line 9, in <module>
    os.remove(file)

FileNotFoundError: [WinError 2] The system cannot find the file specified: 'C'

This is the error after trying shutil.rmtree

runfile('C:/Users/user/.spyder-py3/DELETE FILES THAT DONT CONTAIN CERTAIN WORDS.py', wdir='C:/Users/user/.spyder-py3')
Traceback (most recent call last):

  File "<ipython-input-16-dbc80e182b2b>", line 1, in <module>
    runfile('C:/Users/user/.spyder-py3/DELETE FILES THAT DONT CONTAIN CERTAIN WORDS.py', wdir='C:/Users/user/.spyder-py3')

  File "C:\Users\user\Anaconda31\lib\site-packages\spyder\utils\site\sitecustomize.py", line 705, in runfile
    execfile(filename, namespace)

  File "C:\Users\user\Anaconda31\lib\site-packages\spyder\utils\site\sitecustomize.py", line 102, in execfile
    exec(compile(f.read(), filename, 'exec'), namespace)

  File "C:/Users/user/.spyder-py3/DELETE FILES THAT DONT CONTAIN CERTAIN WORDS.py", line 12, in <module>
    shutil.rmtree(full_path)

  File "C:\Users\user\Anaconda31\lib\shutil.py", line 494, in rmtree
    return _rmtree_unsafe(path, onerror)

  File "C:\Users\user\Anaconda31\lib\shutil.py", line 376, in _rmtree_unsafe
    onerror(os.listdir, path, sys.exc_info())

  File "C:\Users\user\Anaconda31\lib\shutil.py", line 374, in _rmtree_unsafe
    names = os.listdir(path)

NotADirectoryError: [WinError 267] The directory name is invalid: 'C:/Users/user/Desktop/AFL codes to test/newfile1.txt'

1 个答案:

答案 0 :(得分:2)

你应该用常规斜线替换反斜杠。

path = r'C:\Users\user\Desktop\AFL codes to test'

应该是

path = 'C:/Users/user/Desktop/AFL codes to test'

编辑:这是完整的代码,可以帮助您:

import os

path = 'C:/Users/user/Desktop/AFL codes to test'
words = ['buy', 'sell']

files = os.listdir(path)
for each_file in files:
    full_path = "%s/%s" % (path, each_file)
    each_file_content = open(full_path, 'r', encoding="utf-8").read()
    if not any(word in each_file_content for word in words):
       os.unlink(full_path)