Question

我正在从索尔的Whoosh重做我的搜索应用程序。我现在正在学习快速入门。但每次我必须处理字符串时，我一直遇到问题

>>>writer.add_document(iden=fil, content=F2T.file_to_text(fil_path)) ValueError: 'File Name.doc' is not unicode or sequence

然后：

>>>query = QueryParser("content", ix.schema).parse("first")
AssertionError: 'first' is not unicode

这条线直接来自快速启动的turorial！ Whoosh是否要求所有字段都是unicode ？让我的应用程序识别unicode（它甚至不值得）真的很难。至于“不是unicode或序列”，我理解字符串也是序列数据类型。

Answer 1

是的，它要求字符串是Unicode格式。

 query = QueryParser("content", ix.schema).parse("first")

将其更改为：

query = QueryParser("content", ix.schema).parse(u"first")