Solr不会在字段中添加德语字符?

时间:2017-11-22 23:16:58

标签: python-3.x solr solrj solr4 solrcloud

这是我的python代码:

cmd = "curl localhost:8983/solr/" + core + "/update?commit=true -H 'Content-type:application/json' --data-binary " + "\"[{'id':'" + getLastAddedDocumentID(
        'id') + "','title':{'set':'" + title + "'},'author':{'set':'" + authorNames + "'},'abstract':{'set':'" + abstract + "'}}]\""
    print cmd
    pp = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE)
    text, err = pp.communicate()
    print text

我的变量cmd包含用于将数据添加到字段的Curl命令,如下所示:

curl localhost:8983/solr/test/update?commit=true -H 'Content-type:application/json' --data-binary 
"[{'id':'15973569-229c-4ce1-83e2-4f5ba543386f',
    'title':{'set':'Bi\-criteria\ Algorithm\ for\ Scheduling\ Jobs\ on\ Cluster\ Platforms\ \*'},
    'author':{'set':'Pierre\-François\ Dutot\;\ Lionel\ Eyraud\;\ Grégory\ Gr´\;\ Grégory\ Mouní\;\ Denis\ Trystram\;\ '},
    'abstract':{'set':'We\ describe\ in\ this\ paper\ a\ new\ method\ for\ building\ an\ efficient\ algorithm\ for\ scheduling\ jobs\ in\ a\ cluster.\ Jobs\ are\ considered\ as\ parallel\ tasks\ \(PT\)\ which\ can\ be\ scheduled\ on\ any\ number\ of\ processors.\ The\ main\ feature\ is\ to\ consider\ two\ criteria\ that\ are\ optimized\ together.\ These\ criteria\ are\ the\ makespan\ and\ the\ weighted\ minimal\ average\ completion\ time\ \(minsum\).\ They\ are\ chosen\ for\ their\ complementarity,\ to\ be\ able\ to\ represent\ both\ user\-oriented\ objectives\ and\ system\ administrator\ objectives.\ We\ propose\ an\ algorithm\ based\ on\ a\ batch\ policy\ with\ increasing\ batch\ sizes,\ with\ a\ smart\ selection\ of\ jobs\ in\ each\ batch.\ This\ algorithm\ is\ assessed\ by\ intensive\ simulation\ results,\ compared\ to\ a\ new\ lower\ bound\ \(obtained\ by\ a\ relaxation\ of\ ILP\)\ of\ the\ optimal\ schedules\ for\ both\ criteria\ separately.\ It\ is\ currently\ implemented\ in\ an\ actual\ real\-size\ cluster\ platform.'}}]"

字段摘要如下:

<field name="abstract" type="string" docValues="true" indexed="true" stored="true"/>

运行此命令时遇到的问题是:

Traceback (most recent call last):
     

文件“F:/pyCalculation/uploadResearchPaper.py”,第196行,in      addDocument(pathToResearchPapersFolder + department +'/',query,&gt; department)    在addDocument中输入文件“F:/pyCalculation/uploadResearchPaper.py”,第188行      pp = subprocess.Popen(cmd,shell = True,stdout = subprocess.PIPE)    文件“C:\ Python27 \ lib \ subprocess.py”,第390行,在 init 中      errread,errwrite)    在_execute_child中的文件“C:\ Python27 \ lib \ subprocess.py”,第610行      args ='{} / c“{}”'。format(comspec,args)   UnicodeEncodeError:'ascii'编解码器无法对位置&gt; 267中的字符u'\ xe7'进行编码:序数不在范围内(128)

位置276的行是:

  

'set':'Pierre-François\ Dutot \;

问题在于ç这个角色。

我很困惑,为什么solr不允许将此数据添加到字段中?

1 个答案:

答案 0 :(得分:0)

这是python抱怨在尝试调用子进程调用时你的角色无法转换为ascii编解码器。但 use a python Solr client从python连接到Solr时,不要像在命令行上那样通过子进程调用curl。

这也应该解决你的问题,只要你的数据是unicode(因为你已经将它标记为python3,只要它是str,就是这样)。

set命令也用于修改现有文档,因此如果这是您第一次索引文档,则不需要文档结构的set部分。