我想从维基百科转储中提取信息框数据并使用Solr对其进行索引。
在wikipedia转储中,我提取了5000个xml文件来处理。我应该从单独的xml中提取这些xml文件中的信息框数据还是让它们在同一个xml中?
如何在solr架构中输入数据,因为infobox.xml中没有标记我已经被提取。
Infobox musical artist <!-- See Wikipedia:WikiProject_Musicians -->
| name = Russ Conway
| image =
| caption = Russ Conway, pictured on the front of his 1959 [[Extended play|EP]] ''More Party Pops''.
| image_size =
| background = non_vocal_instrumentalist
| birth_name = Trevor Herbert Stanford
| alias =
| birth_date = birth date|1925|09|2|df=y