LightGBM 2.2.4,在Power9上带有GPU的Boost 1.64.0上存在构建问题

时间:2019-02-19 23:25:35

标签: lightgbm

我正在尝试在运行Red Hat Enterprise Server 7.5(Maipo)的IBM Power9系统(“ Witherspoon”,CPU是Power System AC922,8335-GTH)上构建LightGBM版本2.2.4(git hash 5256cda69300d6b83b18180da2992a1e50a6b392)。

我正在使用RHEL打包的C编译器,gcc 4.8.5,cmake的本地版本,3.13.1和Boost版本1.64.0的本地安装,系统已安装CUDA 9.2,并且找到libOpenCL目录并包含文件。

我的配置操作是(从解压缩后的LightGBM树的根目录中的新创建的构建目录中):

# export BOOST_ROOT=/share/sw/boost/1_64_0/ 
# cmake3 -DUSE_GPU=1 -DOpenCL_LIBRARY=/usr/lib64/nvidia/libOpenCL.so -DOpenCL_INCLUDE_DIR=/usr/include/CL/ .. 
# make

配置步骤显然成功,生成了可运行的makefile。

构建失败的概率约为41%,其中存在来自Boost肠道深处的错误:



    [ 41%] Building CXX object CMakeFiles/_lightgbm.dir/src/treelearner/data_parallel_tree_learner.cpp.o
    In file included from /share/sw/boost/1_64_0/include/boost/mpl/aux_/integral_wrapper.hpp:22:0,
                     from /share/sw/boost/1_64_0/include/boost/mpl/int.hpp:20,
                     from /share/sw/boost/1_64_0/include/boost/mpl/lambda_fwd.hpp:23,
                     from /share/sw/boost/1_64_0/include/boost/mpl/aux_/na_spec.hpp:18,
                     from /share/sw/boost/1_64_0/include/boost/mpl/identity.hpp:17,
                     from /share/sw/boost/1_64_0/include/boost/iterator/detail/enable_if.hpp:11,
                     from /share/sw/boost/1_64_0/include/boost/iterator/transform_iterator.hpp:11,
                     from /share/sw/boost/1_64_0/include/boost/algorithm/string/iter_find.hpp:17,
                     from /share/sw/boost/1_64_0/include/boost/algorithm/string/split.hpp:16,
                     from /wrk/user/src/lightgbm/LightGBM/compute/include/boost/compute/device.hpp:18,
                     from /wrk/user/src/lightgbm/LightGBM/compute/include/boost/compute/context.hpp:19,
                     from /wrk/user/src/lightgbm/LightGBM/compute/include/boost/compute/buffer.hpp:15,
                     from /wrk/user/src/lightgbm/LightGBM/compute/include/boost/compute/core.hpp:18,
                     from /wrk/user/src/lightgbm/LightGBM/src/treelearner/gpu_tree_learner.h:27,
                     from /wrk/user/src/lightgbm/LightGBM/src/treelearner/parallel_tree_learner.h:5,
                     from /wrk/user/src/lightgbm/LightGBM/src/treelearner/data_parallel_tree_learner.cpp:1:
    /share/sw/boost/1_64_0/include/boost/mpl/vector.hpp:28:18: error: pasting ")" and "20" does not give a valid preprocessing token
         BOOST_PP_CAT(vector, BOOST_MPL_LIMIT_VECTOR_SIZE).hpp \
                      ^
    /share/sw/boost/1_64_0/include/boost/preprocessor/cat.hpp:29:34: note: in definition of macro ‘BOOST_PP_CAT_I’
     #    define BOOST_PP_CAT_I(a, b) a ## b
                                      ^
    /share/sw/boost/1_64_0/include/boost/mpl/vector.hpp:28:5: note: in expansion of macro ‘BOOST_PP_CAT’
         BOOST_PP_CAT(vector, BOOST_MPL_LIMIT_VECTOR_SIZE).hpp \
         ^
    /share/sw/boost/1_64_0/include/boost/mpl/vector.hpp:36:49: note: in expansion of macro ‘AUX778076_VECTOR_HEADER’
     #   include BOOST_PP_STRINGIZE(boost/mpl/vector/AUX778076_VECTOR_HEADER)
                                                     ^
    In file included from /share/sw/boost/1_64_0/include/boost/math/policies/policy.hpp:14:0,
                     from /share/sw/boost/1_64_0/include/boost/math/special_functions/math_fwd.hpp:28,
                     from /share/sw/boost/1_64_0/include/boost/math/special_functions/sign.hpp:17,
                     from /share/sw/boost/1_64_0/include/boost/lexical_cast/detail/inf_nan.hpp:34,
                     from /share/sw/boost/1_64_0/include/boost/lexical_cast/detail/converter_lexical_streams.hpp:63,
                     from /share/sw/boost/1_64_0/include/boost/lexical_cast/detail/converter_lexical.hpp:54,
                     from /share/sw/boost/1_64_0/include/boost/lexical_cast/try_lexical_convert.hpp:42,
                     from /share/sw/boost/1_64_0/include/boost/lexical_cast.hpp:32,
                     from /wrk/user/src/lightgbm/LightGBM/compute/include/boost/compute/detail/meta_kernel.hpp:23,
                     from /wrk/user/src/lightgbm/LightGBM/compute/include/boost/compute/iterator/buffer_iterator.hpp:26,
                     from /wrk/user/src/lightgbm/LightGBM/compute/include/boost/compute/algorithm/detail/copy_on_device.hpp:18,
                     from /wrk/user/src/lightgbm/LightGBM/compute/include/boost/compute/algorithm/copy.hpp:26,
                     from /wrk/user/src/lightgbm/LightGBM/compute/include/boost/compute/container/vector.hpp:32,
                     from /wrk/user/src/lightgbm/LightGBM/src/treelearner/gpu_tree_learner.h:28,
                     from /wrk/user/src/lightgbm/LightGBM/src/treelearner/parallel_tree_learner.h:5,
                     from /wrk/user/src/lightgbm/LightGBM/src/treelearner/data_parallel_tree_learner.cpp:1:
    /share/sw/boost/1_64_0/include/boost/mpl/vector.hpp:36:73: fatal error: boost/mpl/__attribute__((altivec(vector__)))/__attribute__((altivec(vector__)))20.hpp: No such file or directory
     #   include BOOST_PP_STRINGIZE(boost/mpl/vector/AUX778076_VECTOR_HEADER)

从消息中看来,某些预处理器字符串操作出错了,它可能试图在boot / mpl / vector include目录中找到“ vector20.hpp”文件,但是BOOST_PP_CAT操作出错了,所以无法构造适当的文件名?另外,还牵涉到“ altivec”,Power9 CPU具有altivec功能,也许需要额外的标头或编译器开关吗?

我可以使用Debian打包的Boost版本1.62在带有x86_64架构和CUDA 9.1(用于libOpenCL的东西)的Debian 9“拉伸”系统上成功构建(带有警告)。

我还尝试针对Boost 1.69和针对Boost 1.62(用于Debian的版本)构建Power9版本,并且在同一位置出现相同的错误。

帮助?

1 个答案:

答案 0 :(得分:1)

此问题已在LightGBM github上的issue中解决,我在最初的搜索中就以某种方式错过了它。

此构建尝试是错误的。

显然,编译问题是交互的/增强的交互,并且Power架构上没有OpenCL GPU支持,而LightGBM是引擎盖下的OpenCL,因此无论如何都注定要付出努力。