将Text数据集加载到python weka包装器中

时间:2015-03-15 17:13:31

标签: python weka

我在Windows 7上安装了weka python包装器。我尝试运行示例代码:

import weka.core.jvm as jvm
jvm.start()

data_dir = "E:/Files/Fourth/"

from weka.core.converters import Loader
loader = Loader("weka.core.converters.TextDirectoryLoader")
datasets = [
  data_dir + "File 1",
  data_dir + "File 2",
  data_dir + "File 3",
  data_dir + "File 4",
  data_dir + "File 5"

 ]
data = loader.load_file(datasets)
data.delete_last_attribute()
print(data)

我收到以下错误:

Traceback (most recent call last):
File "C:/Python27/weekaa.py", line 16, in <module>
data = loader.load_file(datasets)
File "C:\Python27\lib\site-packages\weka\core\converters.py", line 67, 
in load_file
self.enforce_type(self.jobject,   
"weka.core.converters.FileSourcedConverter")
File "C:\Python27\lib\site-packages\weka\core\classes.py", line 155, 
in  enforce_type
raise TypeError("Object does not implement or subclass " + 
intf_or_class  + "!")
TypeError: Object does not implement or 
subclass  weka.core.converters.FileSourcedConverter!

我在上一个问题中尝试了解决方案,将类路径添加到weka.jar或python-weka-wrapper但是没有用。加载.arff文件类型时不会出现错误。

是否有加载文本文件的解决方案?

注意:数据集中的每个文件都有一组文本文档文件(供以后的群集使用)

1 个答案:

答案 0 :(得分:0)

TextDirectoryLoader不能与当前发布的python-weka-wrapper版本一起使用,因为它对所有版本的操作都不同。现在更新后(https://groups.google.com/forum/#!topic/python-weka-wrapper/hgfFMnEIKZg)TextDirectoryLoader类已添加到python weka包装器中,可以按如下方式使用:

from weka.core.converters import TextDirectoryLoader 
text_dir = "/the/directory/you/want/to/load" 
loader = TextDirectoryLoader(options=["-dir", text_dir, "-F","-charset", "UTF-8"]) 
data = loader.load() 
print(unicode(data)) 

请确保您拥有更新的python weka包装包,可以从

下载

[http://github.com/fracpete/python-weka-wrapper]

从源代码安装:python setup.py install