我试图使用saprql查询打开并处理我从维基百科下载的数千个文本文件。我使用以下代码:
list_words=[]
for roots, dirs, files in os.walk(path):
for file in files:
if file.endswith(".txt"):
with open(file, 'r') as f:
content= f.read()
#remove the punct
table=string.maketrans(string.punctuation,' '*len(string.punctuation))
s= content.translate(table)
#remove the stopwords
text= ' '.join([word for word in s.split() if word not in stopwords])
alfa= " ".join(text.split())
#remove the verbs
for word, pos in tag(alfa): # trovo tutti i verbi.
if pos != "VB":
lower= word.lower()
lower_2= unicode(lower, 'utf-8', errors='ignore')
list_words.append(lower_2)
#remove numbers
testo_2 = [item for item in list_words if not item.isdigit()]
print set(list_words)
问题是脚本打开了一些文本文件,而其他人则给我错误:"不是这样的文件或目录:blablabla.txt"
有谁知道它为什么会发生,我该如何应对呢?
谢谢!
答案 0 :(得分:2)
function observeMessages(attachment, read, senderId, sent, text, type, userName) {
this.omAttachment = attachment;
this.omRead = read;
this.omSenderId = senderId;
this.omSent = sent;
this.omText = text;
this.omType = type;
this.omUserName = userName;
}
var messageHistoryArray = [];
var newMessage = new observeMessages("abc", "abc", "abc", "abc", "abc", "abc", "abc");
messageHistoryArray.push(newMessage);
console.log(messageHistoryArray[0])
是相对的,您必须连接根和文件以获取如下的绝对文件名:
file
(应将其命名为absolute_filename = os.path.join(roots, file)
with open(absolute_filename, 'r') as f:
.... rest of code
,而不是root
)。