当csv中有一行没有正确的字段时,solr不会插入整个文档。有没有办法告诉solr ok跳过该行,保持前一行并继续递归无效行之后的下一行。
示例
C:\dev\tools\solr-4.7.2\apache-tomcat-6.0.37\bin>curl "http://localhost:8080/solr-4.7.2/update/csv?commit=true&rowid=id&fieldnames=interfaceSeq_s,extractId_s,country_s,invoiceNumber_s,ori
ginalLineId_s,keyValue_s,levelNumber_s,description_s,chargeGroup_s,chargeSubGroup_s,charge_s,startDateTime_s,endDateTime_s,totalValue_s,billedValue_s,discountValue_s,inclusiveValue_s,unit
OfMeasure_s,attribute1_s,attribute2_s,attribute3_s,attribute4_s,attribute5_s,attribute6_s,attribute7_s,attribute8_s,totalUnits_s,inclusiveUnits_s,billedUnits_s,attribute11_s&skipLines=0&s
eparator=%09&stream.file=C:\opt\invoices\input\5924usage_data1.dat&stream.contentType=text/csv&header=false&trim=true&rowidOffset=123758&literal.recordtype_s=usagedata&literal.filename_s=
5924usage_data1.dat"
响应
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader"><int name="status">400</int><int name="QTime">24</int></lst><lst name="error"><str name="msg">CSVLoader: input=file:/C:/opt/invoices/input/5924usage_data1.dat,
line=2,expected 30 values but got 1
values={'10000000003',}</str><int name="code">400</int></lst>
</response>
文件内容
10000000001 593 FIVE 639367 5 547674 4 0682791 Subscription Charges Communications fixe gsm 2006281745 204623 0.1870 0.1870 0.0000 0.0000 Seconds ixed Line - Mobile Telecom Carges 31 0 31
10000000002 593 FIVE 63367 5 547674 4 065050 Subscription Charges Communications fixe gsm 2007010929 22952 0.1650 0.1650 0.0000 0.0000 Seconds Fixed Line - Mobile TELECOM Cages 7 0 7
10000000003
答案 0 :(得分:1)
我找到了答案。如下面的org.apache.solr.handler.loader.CSVLoaderBase中的代码,它不是默认CSV加载器中可配置的东西。我不得不支持我自己的csvrequesthander。
if (vals.length != fieldnames.length) {
input_err("expected "+fieldnames.length+" values but got "+vals.length, vals, line);
}