我正在尝试发布用于索引的csv文件。这是文件格式:
product/productId,product/title,product/price,review/userId,review/profileName,review/helpfulness,review/score,review/time,review/summary,review/text
B00002066I,ah,15.99,unknown,unknown,3/4,5.0,939772800,Inspiring,"I hope a lot of people hear this cd. We need more strong and positive vibes like this. Great vocals, fresh tunes, cross-cultural happiness. Her blues is from the gut. The pop sounds are catchy and mature."
B00002066I,ah,15.99,A2KLYVAS0MIBMQ,Stephen McClaning,0/0,5.0,1332288000,Great CD,"My lovely Pat has one of the GREAT voices of her generation. I have listened to this CD for YEARS and I still LOVE IT. When I'm in a good mood it makes me feel better. A bad mood just evaporates like sugar in the rain. This CD just oozes LIFE. Vocals are jusat STUUNNING and lyrics just kill. One of life's hidden gems. This is a desert isle CD in my book. Why she never made it big is just beyond me. Everytime I play this, no matter black, white, young, old, male, female EVERYBODY says one thing ""Who was that singing ?"""
B000058A81,Chrono Cross,unknown,A18C9SNLZWVBIE,A reader,1/1,5.0,1096934400,First album I've bought since Napster,"We've come a long way since the days of Ninetendo synthesized music! I say without exaggeration that the Chrono Cross Original Soundtrack is probably some of the best instrumental music I've ever heard. Yasunori Mitsuda incorporates so many instruments and musical styles to this collection, it's a real credit to his talent. Guitars, violins, cellos and the piano are just a few of the instruments at play here. Although they differ greatly in musical style, I have to draw an analogy between Mitsuda's music here to the songs of the Grateful Dead"
它为前两行编制索引但在此之后显示错误:
C:\muj\Downloads\solr-7.1.0\example\exampledocs>java -Dc=newamz -Dtype=application/csv -jar post.jar amazon.csv
SimplePostTool version 5.0.0
Posting files to [base] url http://localhost:8983/solr/newamz/update using content-type application/csv...
POSTing file amazon.csv to [base]
SimplePostTool: WARNING: Solr returned an error #400 (Bad Request) for url: http://localhost:8983/solr/newamz/update
SimplePostTool: WARNING: Response: {
"responseHeader":{
"status":400,
"QTime":297},
"error":{
"metadata":[
"error-class","org.apache.solr.common.SolrException",
"root-error-class","java.lang.NumberFormatException"],
"msg":"ERROR: [doc=10e1a7ce-f308-471f-980d-202a6454d9ab] Error adding field 'product_price'='unknown' msg=For input string: \"unknown\"",
"code":400}}
SimplePostTool: WARNING: IOException while reading response: java.io.IOException: Server returned HTTP response code: 400 for URL: http://localhost:8983/solr/newamz/update
1 files indexed.
COMMITting Solr index changes to http://localhost:8983/solr/newamz/update...
Time spent: 0:00:00.766
solr正在以无模式方式工作。 注意:实际的csv文件大小非常大。
这些是我可以在管理模式文件中找到的字段:
<copyField source="review_userId" dest="review_userId_str" maxChars="256"/>
<copyField source="review_profileName" dest="review_profileName_str" maxChars="256"/>
<copyField source="product_productId" dest="product_productId_str" maxChars="256"/>
<copyField source="review_text" dest="review_text_str" maxChars="256"/>
<copyField source="review_helpfulness" dest="review_helpfulness_str" maxChars="256"/>
<copyField source="review_summary" dest="review_summary_str" maxChars="256"/>
<copyField source="prod
答案 0 :(得分:1)
你有一个显然是数字的字段,然后你突然想要将字符串值索引到该字段中。 Solr抱怨无法将该字符串转换为数字(即&#34;未知&#34;不是有效数字:'product_price'='unknown'
)。
由于您在无模式模式下运行,因此该字段的第一种格式决定了其类型。如果要避免这种情况,请使用字段允许的数据类型定义显式模式。