所以我在最近升级的HDP 2.5.6上将Vora从1.3升级到1.4。
除目录外,所有服务似乎都很好。在日志中我看到很多这样的消息:
2017-08-16 11:43:34.591183|+1000|ERROR|Was not able to create new dlog via XXXXX:37999, Status was ERROR_OP_TIMED_OUT, Details: |v2catalog_server|Distributed Log|140607339825056|CreateDLog|log_administration.cpp(211)^^
2017-08-16 11:43:34.611044|+1000|ERROR|Operation (CREATE_LOG) timed out, last status was: ERROR_INTERNAL|v2catalog_server|Distributed Log|140607279314688|Retry|callback_base.cpp(222)^^
2017-08-16 11:43:34.611204|+1000|ERROR|Was not able to create new dlog via XXXXX:20439, Status was ERROR_OP_TIMED_OUT, Details: |v2catalog_server|Distributed Log|140607339825056|CreateDLog|log_administration.cpp(211)^^
2017-08-16 11:43:34.611235|+1000|ERROR|Create DLog ended with status ERROR_OP_TIMED_OUT, retrying in 1000ms|v2catalog_server|Distributed Log|140607339825056|CreateDLog|log_administration.cpp(163)^^
2017-08-16 11:43:35.611757|+1000|ERROR|can't create dlog client[ ERROR_OP_TIMED_OUT ]|v2catalog_server|Catalog|140607339825056|Init|dlog_accessor.cpp(174)^^
terminate called after throwing an instance of 'std::system_error'
what(): Invalid argument
我遗留的任何想法都错误配置了?
[更新] DLog的日志如下:
[Wed Aug 16 10:31:23 2017] DLOG Server Version: 1.2.330.20859
[Wed Aug 16 10:31:23 2017] Listening on XXXXXX:46026
[Wed Aug 16 10:31:23 2017] Loading data store
2017-08-16 10:31:23.475454|+1000|WARN |Server file descriptor limit too large vs system limit; reducing to 896|v2dlog|Distributed Log|140349419014080|Load|store.cpp(2187)^^
[Wed Aug 16 10:31:23 2017] Server file descriptor limit too large vs system limit; reducing to 896
[Wed Aug 16 10:31:23 2017] Recovering log in store
[Wed Aug 16 10:31:23 2017] Starting server in managed mode
[Wed Aug 16 10:31:23 2017] Initializing management interface
2017-08-16 10:31:39.365780|+1000|WARN |f(1)h(1):Host 1 has timed out, disabling|v2dlog|Distributed Log|140349343360768|newcluster.(*FragmentRef).ProcessRule|dlog.go(607)^^
2017-08-16 10:32:10.333444|+1000|ERROR|Log with ID 1 is not registered on unit.|v2dlog|Distributed Log|140349238322944|Seal|tenant_registry.cpp(63)^^
2017-08-16 10:32:10.333754|+1000|ERROR|f(1)h(1):Sealing local unit failed for log 1: disabling|v2dlog|Distributed Log|140349238322944|newcluster.(*replicaStateRef).disable|dlog.go(991)^^
[Wed Aug 16 11:22:24 2017] Received signal: 15. Shutting down
[Wed Aug 16 11:22:24 2017] Flushing store...
[Wed Aug 16 11:22:24 2017] Store flush complete
[Wed Aug 16 11:30:17 2017] DLOG Server Version: 1.2.330.20859
[Wed Aug 16 11:30:17 2017] Listening on XXXXXX:37999
[Wed Aug 16 11:30:17 2017] Loading data store
2017-08-16 11:30:17.371415|+1000|WARN |Server file descriptor limit too large vs system limit; reducing to 896|v2dlog|Distributed Log|140388824664000|Load|store.cpp(2187)^^
[Wed Aug 16 11:30:17 2017] Server file descriptor limit too large vs system limit; reducing to 896
[Wed Aug 16 11:30:17 2017] Recovering log in store
[Wed Aug 16 11:30:17 2017] Starting server in managed mode
[Wed Aug 16 11:30:17 2017] Initializing management interface
2017-08-16 11:30:19.421458|+1000|WARN |missed heartbeat for log 1, host 2; poking with state 2|v2dlog|Distributed Log|140388740617984|newcluster.(*FragmentRef).ProcessRule|dlog.go(619)^^
此外,我已经将Vora DLog配置为在群集的所有三个节点上运行,但我发现它没有在其中一个节点上运行。 Vora Manager日志的(可能)相关部分是:
Aug 17 09:32:36 XXXXXX vora.vora-dlog: [c.63f700da] : stdout from check: [Thu Aug 17 09:32:36 2017] Checking for store #012[Thu Aug 17 09:32:36 2017] No valid store found
Aug 17 09:32:36 XXXXXX vora.vora-dlog: [c.63f700da] : stderr from check: 2017-08-17 09:32:36.590974|+1000|INFO |Command Line: /opt/vora/lib/vora-dlog/bin/v2dlog check --trace-level DEBUG --trace-to-stderr /var/local/vora/vora-dlog|v2dlog|Distributed Log|139919669938112|server_main|main.cpp(1323) #0122017-08-17 09:32:36.592784|+1000|INFO |Checking for store|v2dlog|Distributed Log|139919669938112|Run|main.cpp(1146) #0122017-08-17 09:32:36.593074|+1000|ERROR|Exception during recovery: Encountered a generic I/O error|v2dlog|Distributed Log|139919669938112|Load|store.cpp(2201) #0122017-08-17 09:32:36.593157|+1000|FATAL|Error during recovery|v2dlog|Distributed Log|139919669938112|handle_recovery_error|main.cpp(767) #012[Thu Aug 17 09:32:36 2017] Error during recovery #0122017-08-17 09:32:36.593214|+1000|FATAL| Encountered a generic I/O error|v2dlog|Distributed Log|139919669938112|handle_recovery_error|main.cpp(767) #012[Thu Aug 17 09:32:36 2017] Encountered a generic I/O error #0122017-08-17 09:32:36.593277|+1000|FATAL| boost::filesystem::status: Permission den
Aug 17 09:32:36 XXXXXX vora.vora-dlog: [c.63f700da] : ... ied: "/var/local/vora/vora-dlog"|v2dlog|Distributed Log|139919669938112|handle_recovery_error|main.cpp(767) #012[Thu Aug 17 09:32:36 2017] boost::filesystem::status: Permission denied: "/var/local/vora/vora-dlog" #0122017-08-17 09:32:36.593330|+1000|INFO |No valid store found|v2dlog|Distributed Log|139919669938112|Run|main.cpp(1151)
Aug 17 09:32:36 XXXXXX vora.vora-dlog: [c.63f700da] : Creating SAP Hana Vora Distributed Log store ...
Aug 17 09:32:36 XXXXXX vora.vora-dlog: [c.63f700da] : stdout from format: [Thu Aug 17 09:32:36 2017] Formatting store
Aug 17 09:32:36 XXXXXX vora.vora-dlog: [c.63f700da] : stderr from format: 2017-08-17 09:32:36.615558|+1000|INFO |Command Line: /opt/vora/lib/vora-dlog/bin/v2dlog format --trace-level DEBUG --trace-to-stderr /var/local/vora/vora-dlog|v2dlog|Distributed Log|140176991168448|server_main|main.cpp(1323) #0122017-08-17 09:32:36.617444|+1000|INFO |Formatting store|v2dlog|Distributed Log|140176991168448|Run|main.cpp(1093) #0122017-08-17 09:32:36.617655|+1000|ERROR|boost::filesystem::status: Permission denied: "/var/local/vora/vora-dlog"|v2dlog|Distributed Log|140176991168448|Format|store.cpp(2107) #0122017-08-17 09:32:36.617693|+1000|FATAL|Could not format store.|v2dlog|Distributed Log|140176991168448|Run|main.cpp(1095) #012[Thu Aug 17 09:32:36 2017] Could not format store.
Aug 17 09:32:36 XXXXXX vora.vora-dlog: [c.63f700da] : Error while creating dlog store.
Aug 17 09:32:36 XXXXXX nomad[628]: client: task "vora-dlog-server" for alloc "058fd477-4e80-59ca-7703-e97f2ca1c8c2" failed: Wait returned exit code 1, signal 0, and error <nil>
[UPDATE2]所以我在Vora Manager日志中看到了很多这样的行: 8月17日14:38:27 XXXXXX vora.vora-dlog:[c.2235f785]:运行['sudo',' - i',' - u','root','chown','vora:vora', '/无功/日志/沃拉/沃拉-DLOG /']
我猜它应该会成功,因为在那个节点上我看到目录vora-dlog属于vora用户:
-rw-r--r-- 1 vora vora 0 Jun 29 19:04 .keep
drwxrwx--- 2 vora vora 4096 Aug 16 10:31 dbdir
drwxrwx--- 6 root vora 4096 Aug 15 16:24 vora-discovery
drwxrwx--- 2 vora vora 4096 Aug 16 10:31 vora-dlog
drwxr-xr-x 4 root root 4096 Aug 15 16:23 vora-scheduler
vora-dlog的内容为空。