pandas - 如何通过提交火花在Pyspark应用程序中打包熊猫？

我有一个pyspark应用程序，该应用程序依赖于外部库（熊猫，请求等）。现在，我将通过spark-submit提交此应用程序，在那里我将所有这些外部库（pip install -r $ list_of_ext_deps -t $ target_location）打包到jar文件中，并以pyFiles的形式传递。

导入所有其他库都可以，但是来到pandas时，我得到了： ImportError: C extension: No module named conversion not built. If you want to import pandas from the source directory, you may need to run 'python setup.py build_ext --inplace --force' to build the C extensions first.

如果我通过pip在sparknode上安装了熊猫。我没有看到这种异常，如果我使用pip并将它们捆起来然后传递给pyfile，为什么在单独导入熊猫时出现火花有问题，或者打包依赖于C的正确方法是什么？

注意：我不能在生产环境中使用Conda env。

如何通过提交火花在Pyspark应用程序中打包熊猫？

0 个答案: