这是我的python代码:
cmd = "curl localhost:8983/solr/" + core + "/update?commit=true -H 'Content-type:application/json' --data-binary " + "\"[{'id':'" + getLastAddedDocumentID(
'id') + "','title':{'set':'" + title + "'},'author':{'set':'" + authorNames + "'},'abstract':{'set':'" + abstract + "'}}]\""
print cmd
pp = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE)
text, err = pp.communicate()
print text
我的变量cmd包含用于将数据添加到字段的Curl命令,如下所示:
curl localhost:8983/solr/test/update?commit=true -H 'Content-type:application/json' --data-binary
"[{'id':'15973569-229c-4ce1-83e2-4f5ba543386f',
'title':{'set':'Bi\-criteria\ Algorithm\ for\ Scheduling\ Jobs\ on\ Cluster\ Platforms\ \*'},
'author':{'set':'Pierre\-François\ Dutot\;\ Lionel\ Eyraud\;\ Grégory\ Gr´\;\ Grégory\ Mouní\;\ Denis\ Trystram\;\ '},
'abstract':{'set':'We\ describe\ in\ this\ paper\ a\ new\ method\ for\ building\ an\ efficient\ algorithm\ for\ scheduling\ jobs\ in\ a\ cluster.\ Jobs\ are\ considered\ as\ parallel\ tasks\ \(PT\)\ which\ can\ be\ scheduled\ on\ any\ number\ of\ processors.\ The\ main\ feature\ is\ to\ consider\ two\ criteria\ that\ are\ optimized\ together.\ These\ criteria\ are\ the\ makespan\ and\ the\ weighted\ minimal\ average\ completion\ time\ \(minsum\).\ They\ are\ chosen\ for\ their\ complementarity,\ to\ be\ able\ to\ represent\ both\ user\-oriented\ objectives\ and\ system\ administrator\ objectives.\ We\ propose\ an\ algorithm\ based\ on\ a\ batch\ policy\ with\ increasing\ batch\ sizes,\ with\ a\ smart\ selection\ of\ jobs\ in\ each\ batch.\ This\ algorithm\ is\ assessed\ by\ intensive\ simulation\ results,\ compared\ to\ a\ new\ lower\ bound\ \(obtained\ by\ a\ relaxation\ of\ ILP\)\ of\ the\ optimal\ schedules\ for\ both\ criteria\ separately.\ It\ is\ currently\ implemented\ in\ an\ actual\ real\-size\ cluster\ platform.'}}]"
字段摘要如下:
<field name="abstract" type="string" docValues="true" indexed="true" stored="true"/>
运行此命令时遇到的问题是:
Traceback (most recent call last):
文件“F:/pyCalculation/uploadResearchPaper.py”,第196行,in addDocument(pathToResearchPapersFolder + department +'/',query,&gt; department) 在addDocument中输入文件“F:/pyCalculation/uploadResearchPaper.py”,第188行 pp = subprocess.Popen(cmd,shell = True,stdout = subprocess.PIPE) 文件“C:\ Python27 \ lib \ subprocess.py”,第390行,在 init 中 errread,errwrite) 在_execute_child中的文件“C:\ Python27 \ lib \ subprocess.py”,第610行 args ='{} / c“{}”'。format(comspec,args) UnicodeEncodeError:'ascii'编解码器无法对位置&gt; 267中的字符u'\ xe7'进行编码:序数不在范围内(128)
位置276的行是:
'set':'Pierre-François\ Dutot \;
问题在于ç这个角色。
我很困惑,为什么solr不允许将此数据添加到字段中?
答案 0 :(得分:0)
这是python抱怨在尝试调用子进程调用时你的角色无法转换为ascii编解码器。但请 use a python Solr client从python连接到Solr时,不要像在命令行上那样通过子进程调用curl。
这也应该解决你的问题,只要你的数据是unicode(因为你已经将它标记为python3,只要它是str,就是这样)。
set
命令也用于修改现有文档,因此如果这是您第一次索引文档,则不需要文档结构的set
部分。