Question

这是我的python代码：

cmd = "curl localhost:8983/solr/" + core + "/update?commit=true -H 'Content-type:application/json' --data-binary " + "\"[{'id':'" + getLastAddedDocumentID(
        'id') + "','title':{'set':'" + title + "'},'author':{'set':'" + authorNames + "'},'abstract':{'set':'" + abstract + "'}}]\""
    print cmd
    pp = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE)
    text, err = pp.communicate()
    print text

我的变量cmd包含用于将数据添加到字段的Curl命令，如下所示：

curl localhost:8983/solr/test/update?commit=true -H 'Content-type:application/json' --data-binary 
"[{'id':'15973569-229c-4ce1-83e2-4f5ba543386f',
    'title':{'set':'Bi\-criteria\ Algorithm\ for\ Scheduling\ Jobs\ on\ Cluster\ Platforms\ \*'},
    'author':{'set':'Pierre\-François\ Dutot\;\ Lionel\ Eyraud\;\ Grégory\ Gr´\;\ Grégory\ Mouní\;\ Denis\ Trystram\;\ '},
    'abstract':{'set':'We\ describe\ in\ this\ paper\ a\ new\ method\ for\ building\ an\ efficient\ algorithm\ for\ scheduling\ jobs\ in\ a\ cluster.\ Jobs\ are\ considered\ as\ parallel\ tasks\ \(PT\)\ which\ can\ be\ scheduled\ on\ any\ number\ of\ processors.\ The\ main\ feature\ is\ to\ consider\ two\ criteria\ that\ are\ optimized\ together.\ These\ criteria\ are\ the\ makespan\ and\ the\ weighted\ minimal\ average\ completion\ time\ \(minsum\).\ They\ are\ chosen\ for\ their\ complementarity,\ to\ be\ able\ to\ represent\ both\ user\-oriented\ objectives\ and\ system\ administrator\ objectives.\ We\ propose\ an\ algorithm\ based\ on\ a\ batch\ policy\ with\ increasing\ batch\ sizes,\ with\ a\ smart\ selection\ of\ jobs\ in\ each\ batch.\ This\ algorithm\ is\ assessed\ by\ intensive\ simulation\ results,\ compared\ to\ a\ new\ lower\ bound\ \(obtained\ by\ a\ relaxation\ of\ ILP\)\ of\ the\ optimal\ schedules\ for\ both\ criteria\ separately.\ It\ is\ currently\ implemented\ in\ an\ actual\ real\-size\ cluster\ platform.'}}]"

字段摘要如下：

<field name="abstract" type="string" docValues="true" indexed="true" stored="true"/>

运行此命令时遇到的问题是：

Traceback (most recent call last):
文件“F：/pyCalculation/uploadResearchPaper.py”，第196行，in addDocument（pathToResearchPapersFolder + department +'/'，query，＆gt; department）在addDocument中输入文件“F：/pyCalculation/uploadResearchPaper.py”，第188行 pp = subprocess.Popen（cmd，shell = True，stdout = subprocess.PIPE）文件“C：\ Python27 \ lib \ subprocess.py”，第390行，在 init 中 errread，errwrite）在_execute_child中的文件“C：\ Python27 \ lib \ subprocess.py”，第610行 args ='{} / c“{}”'。format（comspec，args） UnicodeEncodeError：'ascii'编解码器无法对位置＆gt; 267中的字符u'\ xe7'进行编码：序数不在范围内（128）

位置276的行是：

'set'：'Pierre-François\ Dutot \;

问题在于ç这个角色。

我很困惑，为什么solr不允许将此数据添加到字段中？

Answer 1

这是python抱怨在尝试调用子进程调用时你的角色无法转换为ascii编解码器。但请 use a python Solr client从python连接到Solr时，不要像在命令行上那样通过子进程调用curl。

这也应该解决你的问题，只要你的数据是unicode（因为你已经将它标记为python3，只要它是str，就是这样）。

set命令也用于修改现有文档，因此如果这是您第一次索引文档，则不需要文档结构的set部分。

Solr不会在字段中添加德语字符？

1 个答案: