Question

我正在尝试在Google云端平台上https://github.com/GoogleCloudPlatform/solutions-google-compute-engine-cluster-for-hadoop部署Google提供的示例Hadoop应用。

我按照一步一步的说明遵循了所有设置说明。我能够设置环境并成功启动集群。但是我无法运行MapReduce部分。我在终端上执行此命令：

./compute_cluster_for_hadoop.py mapreduce <project ID> <bucket name> [--prefix <prefix>]
--input gs://<input directory on Google Cloud Storage>  \
--output gs://<output directory on Google Cloud Storage>  \
--mapper sample/shortest-to-longest-mapper.pl  \
--reducer sample/shortest-to-longest-reducer.pl  \
--mapper-count 5  \
--reducer-count 1

我收到以下错误：

sudo: unknown user: hadoop
sudo: unable to initialize policy plugin
Traceback (most recent call last):
File "./compute_cluster_for_hadoop.py", line 230, in <module>
main()
File "./compute_cluster_for_hadoop.py", line 226, in main
ComputeClusterForHadoop().ParseArgumentsAndExecute(sys.argv[1:])
File "./compute_cluster_for_hadoop.py", line 222, in ParseArgumentsAndExecute
params.handler(params)
File "./compute_cluster_for_hadoop.py", line 51, in MapReduce
gce_cluster.GceCluster(flags).StartMapReduce()
File "/home/ubuntu-gnome/Hadoop-sample-app/solutions-google-compute-engine-cluster-for-hadoop-master/gce_cluster.py", line 545, in StartMapReduce
input_dir, output_dir)
File "/home/ubuntu-gnome/Hadoop-sample-app/solutions-google-compute-engine-cluster-for-hadoop-master/gce_cluster.py", line 462, in _StartScriptAtMaster
raise RemoteExecutionError('Remote execution error')
gce_cluster.RemoteExecutionError: Remote execution error

由于我按照原样执行了所有步骤，因此我无法理解为什么会出现这个问题？

＆＃39; hadoop＆＃39;用户实际上没有在以前执行的脚本中创建，或者用户权限有问题？或问题出在其他地方？

请帮我解决这个错误.. !!我被困在这里，无法继续前进。

Answer 1

通常期望设置过程会创建用户＆＃39; hadoop＆＃39;自动;它是在第75-76行的startup-script.sh内完成的：

# Set up user and group
groupadd --gid 5555 hadoop
useradd --uid 1111 --gid hadoop --shell /bin/bash -m hadoop

设置的某些部分实际上可能失败了。

也就是说，您正在引用的示例，如果您正在编写自己的直接与GCE API交互的Python应用程序，那么它仍然是一个有用的起点，作为部署Hadoop的一种方式已被弃用Google Compute Engine。如果您确实想要使用Hadoop，则应使用Google支持的部署工具bdutil and its associated quickstart。集群中有一些相似之处，包括用户hadoop的设置。然而，一个关键的区别是bdutil还将包含和配置GCS connector for Hadoop，以便您的MapReduce可以直接对GCS中的数据进行操作，而不是首先将其复制到HDFS中。

获取＆＃sudo：未知用户：hadoop＆＃39;和＆＃39; sudo：无法初始化策略插件错误＆＃39;在运行hadoop集群时在Google Cloud Platform上运行

1 个答案: