第一次发帖和一点菜鸟所以如果礼仪或格式有任何问题,请告诉我。
我正在尝试使用文件上的grep函数(下图)来检查文件中是否存在单词。当我查看文件时,这个词肯定存在。它被空格包围,是一行中的最后一个字。
由于某种原因,grep无法找到该单词且程序返回0.为什么?
谢谢!
import os
import re
word = "aliows"
folder = '/Users/jordanfreedman/Thinkful/Projects/Spam_Filter/enron1/spam/'
email = '4201.2005-04-05.GP.spam.txt'
number = int(os.popen("grep -w -i -l " + word + " " + folder + email + " | wc -l").read())
print number
答案 0 :(得分:0)
您可以使用退出状态找出是否匹配:
import os
from subprocess import STDOUT, call
path = os.path.join(folder, email)
with open(os.devnull, 'wb', 0) as devnull:
rc = call(['grep', '-w', '-l', '-i', '-F', word, path],
stdout=devnull, stderr=STDOUT)
if rc == 0:
print('found')
elif rc == 1:
print('not found')
else:
print('error')
或as @stevieb mentioned,您可以在纯Python中找到该单词是否在给定文件中:
import re
from contextlib import closing
from mmap import ACCESS_READ, mmap
with open(path) as f, closing(mmap(f.fileno(), 0, access=ACCESS_READ)) as m:
if re.search(br"(?i)\b%s\b" % re.escape(word), m):
print('found')
答案 1 :(得分:-1)
您需要发布文件片段,以便我们测试grep
语句。此外,没有理由拒绝:
import re
word = "aliows"
folder = '/Users/jordanfreedman/Thinkful/Projects/Spam_Filter/enron1/spam/'
email = '4201.2005-04-05.GP.spam.txt'
file = folder + email
fh = open(file, 'r')
contents = re.findall(word, fh.read())
print(len(contents))