我正在尝试阅读100个训练文件并使用sklean对其进行矢量化。这些文件的内容是表示系统调用的单词。一旦矢量化,我想打印出矢量。 我的第一次尝试如下:
import re
import os
import sys
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfTransformer
import numpy as np
import numpy.linalg as LA
trainingdataDir = 'C:\data\Training data'
def readfile():
for file in os.listdir(trainingdataDir):
trainingfiles = os.path.join(trainingdataDir, file)
if os.path.isfile(trainingfiles):
data = open(trainingfiles, "rb").read()
return data
train_set = [readfile()]
vectorizer = CountVectorizer()
transformer = TfidfTransformer()
trainVectorizerArray = vectorizer.fit_transform(train_set).toarray()
print 'Fit Vectorizer to train set', trainVectorizerArray
但是,这只返回最后一个文件的向量。 我得出结论,打印功能应放在for循环中。所以第二次尝试:
def readfile():
for file in os.listdir(trainingdataDir):
trainingfiles = os.path.join(trainingdataDir, file)
if os.path.isfile(trainingfiles):
data = open(trainingfiles, "rb").read()
trainVectorizerArray = vectorizer.fit_transform(data).toarray()
print 'Fit Vectorizer to train set', trainVectorizerArray
但是,这不会返回任何内容。 你能帮我解决这个问题吗?为什么我无法看到正在打印的矢量?
答案 0 :(得分:0)
问题是因为用于矢量化的数据集列表是空的。我设法对一组100个文件进行矢量化。我首先打开文件,然后读取每个文件,最后将它们添加到列表中。然后,“tfidf_vectorizer'
使用数据集列表app.service('resultDeals',['$translate','$cookies','$http', '$q',
function($translate,$cookies,$http,$q) {
var currentOrigin = {};
var originsUser={};
return {
loadOrigins:function() {
var deferred = $q.defer();
$http.get('app/deals/deal.json').success(function(response){
console.log(response);
originsUser = response.data;
deferred.resolve(originUser),
}).error(function(err){
console.log(err);
deferred.reject();
});
return deferred.promise;
},
userOrigin:originsUser
};
}]);
// In controller
resultDeals.loadOrigins().success(function(updateOrigins) {
$scope.updateOrigins = updateOrigins;
}).error(function() {
console.log('bad !');
});