为什么不能以故障安全方式将csv文档插入SOLR

时间:2014-09-28 06:39:35

标签: csv curl solr

当csv中有一行没有正确的字段时,solr不会插入整个文档。有没有办法告诉solr ok跳过该行,保持前一行并继续递归无效行之后的下一行。

示例

C:\dev\tools\solr-4.7.2\apache-tomcat-6.0.37\bin>curl "http://localhost:8080/solr-4.7.2/update/csv?commit=true&rowid=id&fieldnames=interfaceSeq_s,extractId_s,country_s,invoiceNumber_s,ori
ginalLineId_s,keyValue_s,levelNumber_s,description_s,chargeGroup_s,chargeSubGroup_s,charge_s,startDateTime_s,endDateTime_s,totalValue_s,billedValue_s,discountValue_s,inclusiveValue_s,unit
OfMeasure_s,attribute1_s,attribute2_s,attribute3_s,attribute4_s,attribute5_s,attribute6_s,attribute7_s,attribute8_s,totalUnits_s,inclusiveUnits_s,billedUnits_s,attribute11_s&skipLines=0&s
eparator=%09&stream.file=C:\opt\invoices\input\5924usage_data1.dat&stream.contentType=text/csv&header=false&trim=true&rowidOffset=123758&literal.recordtype_s=usagedata&literal.filename_s=
5924usage_data1.dat"

响应

<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader"><int name="status">400</int><int name="QTime">24</int></lst><lst name="error"><str name="msg">CSVLoader: input=file:/C:/opt/invoices/input/5924usage_data1.dat,
line=2,expected 30 values but got 1
        values={'10000000003',}</str><int name="code">400</int></lst>
</response>

文件内容

10000000001     593     FIVE                                639367  5       547674      4   0682791                     Subscription Charges            Communications                   fixe  gsm              2006281745  204623  0.1870          0.1870          0.0000          0.0000          Seconds                         ixed Line -          Mobile                                                                                   Telecom                                                                                       Carges                      31              0               31                                                                                                                                                                                                                                                                                  
10000000002     593     FIVE                                63367   5       547674      4   065050                      Subscription Charges            Communications                   fixe  gsm              2007010929  22952   0.1650          0.1650          0.0000          0.0000          Seconds                         Fixed Line -             Mobile                                                                                  TELECOM                                                                                            Cages                   7               0               7                                                                                                                                                                                                                                                                                   
10000000003

1 个答案:

答案 0 :(得分:1)

我找到了答案。如下面的org.apache.solr.handler.loader.CSVLoaderBase中的代码,它不是默认CSV加载器中可配置的东西。我不得不支持我自己的csvrequesthander。

    if (vals.length != fieldnames.length) {
      input_err("expected "+fieldnames.length+" values but got "+vals.length, vals, line);
    }