完全导入和deletedPkQuery有效, 我跟踪了数据库服务器,执行了deltaQuery和deletedPkQuery。
我已多次手动执行这些查询,但确实会返回行,但是
它不会获取任何行。我做的最后一件事是在所有查询中输出FILE_ID作为id。仍然无法工作。
<dataConfig>
<dataSource name="db" type="JdbcDataSource" driver="com.microsoft.sqlserver.jdbc.SQLServerDriver" url="jdbc:sqlserver://localhost:1433;databaseName=norway_operations;responseBuffering=adaptive;selectMethod=cursor" user="noropuser" password="noropuser" autoCommit="false" transactionIsolation="TRANSACTION_READ_COMMITTED"holdability="CLOSE_CURSORS_AT_COMMIT"/>
<dataSource name="bin" type="BinFileDataSource" basePath="D:\OPG_FILESTORE"/>
<document>
<entity name="file" dataSource="db" pk="id" query="select FILE_ID as id, CATEGORY_ID, CATEGORY_NAME, FILENAME, FILE_MIME_TYPE, PATH, LAST_MODIFIED as last_modified from DOCUMENTS"
deltaQuery="select FILE_ID as id from DOCUMENTS where LAST_MODIFIED > '${dataimporter.last_index_time}'"
deltaImportQuery="select FILE_ID as id, CATEGORY_ID, CATEGORY_NAME, FILENAME, FILE_MIME_TYPE, PATH, LAST_MODIFIED as last_modified from DOCUMENTS where FILE_ID = '${dih.delta.id}'"
deletedPkQuery="delete from PK_DELETE_HISTORY output DELETED.PK AS id where PK_NAME = 'FILE_ID'" >
<field column="id" name="id" />
<field column="CATEGORY_ID" name="categoryId" />
<field column="CATEGORY_NAME" name="category" />
<field column="FILENAME" name="filename" />
<field column="FILE_MIME_TYPE" name="content_type" />
<field column="last_modified" name="last_modified" />
<entity name="tika" processor="TikaEntityProcessor" url="${file.PATH}" parser="org.apache.tika.parser.AutoDetectParser" format="text" dataSource="bin" onError="continue">
<field column="text" name="content" />
<field column="title" name="title"/>
<field column="subject" name="subject"/>
<field column="description" name="description"/>
<field column="comments" name="comments"/>
<field column="author" name="author"/>
<field column="keywords" name="keywords"/>
<field column="url" name="url"/>
<field column="content_type" name="content_type" />
<field column="links" name="links" />
</entity>
</entity>
</document>
微量
declare @p1 int
set @p1=180150003
declare @p5 int
set @p5=-1
exec sp_cursoropen @p1 output,N'select FILE_ID as id from DOCUMENTS where LAST_MODIFIED > ''2014-02-06 15:02:40''',16,8193,@p5 output
select @p1, @p5
当我手动运行时,它返回1行
回应:
<?xml version="1.0" encoding="UTF-8" ?>
- <response>
- <lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">31</int>
</lst>
- <lst name="initArgs">
- <lst name="defaults">
<str name="config">db-data-config.xml</str>
<int name="rows">0</int>
<int name="start">0</int>
</lst>
</lst>
<str name="command">delta-import</str>
<str name="mode">debug</str>
<arr name="documents" />
<lst name="verbose-output" />
<str name="status">idle</str>
<str name="importResponse" />
- <lst name="statusMessages">
<str name="Total Requests made to DataSource">2</str>
<str name="Total Rows Fetched">0</str>
<str name="Total Documents Skipped">0</str>
<str name="Delta Dump started">2014-02-06 15:32:20</str>
<str name="Identifying Delta">2014-02-06 15:32:20</str>
<str name="Deltas Obtained">2014-02-06 15:32:20</str>
<str name="Building documents">2014-02-06 15:32:20</str>
<str name="Total Changed Documents">0</str>
<str name="Total Documents Processed">0</str>
<str name="Time taken">0:0:0.16</str>
</lst>
<str name="WARNING">This response format is experimental. It is likely to change in the future.</str>
</response>
答案 0 :(得分:4)
事情可能值得寻找:
1.保存在dataimport.properties
配置文件
这在我之前发生
运行delta-import(成功)将更新conf / dataimport.properties文件中的{dataimporter.last_index_time}。下次,您的查询可能会基于新的时间戳运行,除非您更新了数据库,否则可能会返回零行。
<强> 2。 dataimporter.delta.id和dataimporter.last_index_time
dataimporter.delta.id 应为 dih.delta.id
last_index_time 保留在 dataimporter 命名空间中。 **dataimporter.last_index_time** works at least in solr 4.2.0. dih.last_index_time might works too as it was mentioned in the solr wiki, but I haven't test it
第3。需要将时间戳转换为正确的DataTime数据类型取决于数据库。
如果是SQL服务器:
LAST_MODIFIED_DATETIME > convert(datetime,'${dataimporter.last_index_time}')
答案 1 :(得分:1)
某些版本的last_index_time存在一些错误。你没有说明你所使用的Solr版本,但是现在大多数人都是4.x。
此外,还有一些错误,旧的dataimporter属性命名空间不起作用。对于4.x,你应该使用dih属性名称空间,这意味着dih.last_index_time和dih.delta.id而不是dataimporter。*作为属性名称。
答案 2 :(得分:0)
我正在Windows上的Tomcat 7中运行SOLR。跟踪ODBC连接我发现该语言设置为挪威语。 (挪威语=挪威语)挪威语;)
set arithabort off
set numeric_roundabort off
set ansi_warnings on
set ansi_padding on
set ansi_nulls on
set concat_null_yields_null on
set cursor_close_on_commit off
set implicit_transactions off
set language Norsk
set dateformat dmy
set datefirst 1
set transaction isolation level read committed
JVM以这些args启动
-Duser.region=US
-Duser.language=en
-Duser.timezone=Europe/Oslo
是否设置了挪威语或英语
没有任何区别将propertyWriter标记添加到配置文件中解决了问题。
<dataConfig>
<propertyWriter dateFormat="yyyy-dd-MM HH:mm:ss" type="SimplePropertiesWriter" directory="D:/tmp" filename="knowledgebase.dih.properties" locale="English (United States)" />
<dataSource name="db" type="JdbcDataSource" driver="com.microsoft.sqlserver.jdbc.SQLServerDriver" url="jdbc:sqlserver://localhost:1433;databaseName=norway_operations;responseBuffering=adaptive;selectMethod=cursor" user="noropuser" password="noropuser" autoCommit="false" transactionIsolation="TRANSACTION_READ_COMMITTED" holdability="CLOSE_CURSORS_AT_COMMIT" />
<document>
<entity type="a" name="knowledge" dataSource="db" pk="BASE_ID" query="select * from vKNOWLEDGE_BASE"
deltaQuery="select BASE_ID from vKNOWLEDGE_BASE where '${dataimporter.last_index_time}' < TIMESTAMP"
deltaImportQuery="select * from vKNOWLEDGE_BASE where BASE_ID = '${dataimporter.delta.BASE_ID}'"
deletedPkQuery="delete from PK_DELETE_HISTORY output DELETED.PK AS BASE_ID where PK_NAME = 'BASE_ID'" >
<field column="BASE_ID" name="id" />
<field column="CATEGORY_ID" name="categoryId" />
<field column="CATEGORY_NAME" name="category" />
<field column="DESCRIPTION" name="description" />
<field column="SOLUTION" name="solution" />
<field column="USER_FULL_NAME" name="author" />
<field column="SOFTWARE_VERSION" name="software_version" />
<field column="TIMESTAMP" name="last_modified" />
<entity name="keywords" dataSource="db" pk="KEYWORD_ID" query="select KNOWLEDGE_KEYWORDS.* from KNOWLEDGE_KEYWORDS_TO_BASE left join KNOWLEDGE_KEYWORDS on (KNOWLEDGE_KEYWORDS_TO_BASE.KEYWORD_ID = KNOWLEDGE_KEYWORDS.KEYWORD_ID) where KNOWLEDGE_KEYWORDS_TO_BASE.BASE_ID = '${knowledge.BASE_ID}'">
<field column="KEYWORD_NAME" name="keywords" />
</entity>
</entity>
</document>
还可以向JdbcDataSource网址添加语言选项。
jdbc:sqlserver://localhost:1433;databaseName=XXX;responseBuffering=adaptive;selectMethod=cursor;language=XXX
我没有对此进行测试,但我认为这也可以解决问题,如果它已设置为英语,因为在SQL服务器查询中语言设置为挪威语,但在where子句中使用的日期格式进行比较LAST_MODIFIED列是yyyy-MM-dd HH:mm:ss,挪威语的默认格式是yyyy-dd-MM HH:mm:ss。
答案 3 :(得分:0)
我有同样的问题并且发现deltaImportQuery区分大小写
将我的id列作为“ID”
deltaImportQuery =“从temp中选择id,州,名称,地点,城市 ID ='$ {dih.delta.ID}
答案 4 :(得分:0)
Solr似乎在dataimport.properties
时区-- for mysql, following would convert `update_date` to utc before compare in where clause
deltaQuery="select id from book where status = 0 and CONVERT_TZ(`update_date`, @@session.time_zone, '+00:00') > '${dih.last_index_time}';"
中保存了时间戳,因此您需要将数据库中的时区转换为#define INCLUDE_TREE_WITH_PARENT
#ifdef INCLUDE_TREE_WITH_PARENT
#include "tree_with_parent.h"
#else INCLUDE_TREE_WITH_PARENT
#include "tree.h"
#end
,然后再与{{1}}中的值进行比较。
e.g
{{1}}