将文本文件加载到python weka包装器

时间:2015-03-16 21:34:44

标签: python weka

我在Windows 7上安装了weka python包装器。我尝试运行示例代码:

import weka.core.jvm as jvm
jvm.start()

data_dir = "E:/Files/Fourth/"

from weka.core.converters import Loader
loader = Loader("weka.core.converters.TextDirectoryLoader")
datasets = [
    data_dir + "File 1",
    data_dir + "File 2",
    data_dir + "File 3",
    data_dir + "File 4",
    data_dir + "File 5"

    ]
data = loader.load_file(datasets)
data.delete_last_attribute()
print(data)

我收到以下错误:

Traceback (most recent call last):
 File "C:/Python27/weekaa.py", line 16, in <module>
  data = loader.load_file(datasets)
 File "C:\Python27\lib\site-packages\weka\core\converters.py", line 67,  in load_file
  self.enforce_type(self.jobject, "weka.core.converters.FileSourcedConverter")
 File "C:\Python27\lib\site-packages\weka\core\classes.py", line 155, in  enforce_type
    raise TypeError("Object does not implement or subclass " + intf_or_class  + "!")
TypeError: Object does not implement or 
  subclass  weka.core.converters.FileSourcedConverter!

我通过向weka.jar或python-weka-wrapper添加类路径但在以前的问题(在stackoverflow中)尝试了解决方案但是没有用。加载.arff文件类型时不会出现错误。

是否有加载文本文件的解决方案?

注意:数据集中的每个文件都有一组文本文档文件(供以后的群集使用)

1 个答案:

答案 0 :(得分:0)

Weka的TextDirectoryLoader类不能与 python-weka-wrapper 一起使用,最高版本为0.2.2。即将发布的0.2.3版(或github repository)包含一个名为TextDirectoryLoader的新Python包装器,可从weka.core.converters模块获得,允许您立即使用此类。这也在python-weka-wrapper mailing list上得到了解答。

from weka.core.converters import TextDirectoryLoader
text_dir = "/the/directory/you/want/to/load"
loader = TextDirectoryLoader(options=["-dir", text_dir, "-F", "-charset", "UTF-8"])
data = loader.load()
print(unicode(data))