初始化脚本

时间:2017-12-05 16:53:38

标签: bash google-cloud-platform google-cloud-dataproc

以下初始化脚本旨在在Dataproc集群上安装Anaconda,一些其他软件包和基因组分析工具包Hail。但是,在启动集群时,我收到错误“Dataproc Agent Startup Failed”。奇怪的是,将这段代码以root身份复制/粘贴到终端中可以很好地运行整个脚本。

#!/bin/bash

ROLE=$(/usr/share/google/get_metadata_value attributes/dataproc-role)

if [[ "${ROLE}" == 'Master' ]]; then

    # Download and install Anaconda in /home/anaconda2/
    wget -P /home/anaconda2/ https://repo.continuum.io/archive/Anaconda2-5.0.1-Linux-x86_64.sh
    bash /home/anaconda2/Anaconda2-5.0.1-Linux-x86_64.sh -b -f -p /home/anaconda2
    chmod -R 0777 /home/anaconda2/

    # Additional packages to install.
    /home/anaconda2/bin/pip install lxml
    /home/anaconda2/bin/pip install jupyter-spark
    /home/anaconda2/bin/pip install jgscm

    # Download current version of Hail for Spark 2.1.0. Currently set to use a default version.
    wget -P /home https://storage.googleapis.com/hail-common/distributions/0.1/Hail-0.1-5306854d2213-Spark-2.1.0.zip

    # Unzip hail and create resources.
    unzip /home/Hail-0.1-5306854d2213-Spark-2.1.0.zip -d /home
    zip -r /home/hail/python/pyhail.zip /home/hail/python/hail

    # Create Jupyter kernel spec file.
    mkdir /home/anaconda2/share/jupyter/kernels/
    mkdir /home/anaconda2/share/jupyter/kernels/hail
    echo '{"display_name": "Hail", "language": "python", "argv": ["/home/anaconda2/bin/python", "-m", "ipykernel", "-f", "{connection_file}"], "env": {"SPARK_HOME": "/usr/lib/spark/", "PYTHONHASHSEED": "0", "SPARK_CONF_DIR": "/home/hail/conf/", "PYTHONPATH": "/usr/lib/spark/python/:/usr/lib/spark/python/lib/py4j-0.10.3-src.zip:/home/hail/hail/python/pyhail.zip"}}' > /home/anaconda2/share/jupyter/kernels/hail/kernel.json

    # Copy the Spark configuration files to a custom directory.
    mkdir /home/hail/conf
    cp /etc/spark/conf/spark-defaults.conf /home/hail/conf/spark-defaults.conf
    cp /etc/spark/conf/spark-env.sh /home/hail/conf/spark-env.sh

    # Modify custom Spark conf file to reference Hail jar and zip
    echo 'spark.files=/home/hail/lib/hail-all-spark.jar' >> /home/hail/conf/spark-defaults.conf
    echo 'spark.submit.pyFiles=/home/hail/lib/hail-all-spark.jar' >> /home/hail/conf/spark-defaults.conf
    echo 'spark.driver.extraClassPath=./hail-all-spark.jar' >> /home/hail/conf/spark-defaults.conf
    echo 'spark.executor.extraClassPath=./hail-all-spark.jar' >> /home/hail/conf/spark-defaults.conf

    # Add Spark variable designating Anaconda Python executable as the default on driver, in both custom and default conf files.
    echo 'PYSPARK_DRIVER_PYTHON=/home/anaconda2/bin/python' >> /home/hail/conf/spark-env.sh
    echo 'PYSPARK_DRIVER_PYTHON=/home/anaconda2/bin/python' >> /etc/spark/conf/spark-env.sh

    # Create Jupyter configuration file.
    mkdir /home/anaconda2/etc/jupyter/
    echo 'c.Application.log_level = "DEBUG"' >> /home/anaconda2/etc/jupyter/jupyter_notebook_config.py
    echo 'c.NotebookApp.ip = "*"' >> /home/anaconda2/etc/jupyter/jupyter_notebook_config.py
    echo 'c.NotebookApp.open_browser = False' >> /home/anaconda2/etc/jupyter/jupyter_notebook_config.py
    echo 'c.NotebookApp.port = 8123' >> /home/anaconda2/etc/jupyter/jupyter_notebook_config.py
    echo 'c.NotebookApp.token = ""' >> /home/anaconda2/etc/jupyter/jupyter_notebook_config.py
    echo 'c.NotebookApp.contents_manager_class = "jgscm.GoogleStorageContentManager"' >> /home/anaconda2/etc/jupyter/jupyter_notebook_config.py

    # Setup jupyter-spark extension.
    /home/anaconda2/bin/jupyter serverextension enable --user --py jupyter_spark
    /home/anaconda2/bin/jupyter nbextension install --user --py jupyter_spark
    /home/anaconda2/bin/jupyter nbextension enable --user --py jupyter_spark
    /home/anaconda2/bin/jupyter nbextension enable --user --py widgetsnbextension

    # Create the systemd service file for Jupyter notebook server process.
    echo '[Unit]' >> /lib/systemd/system/jupyter.service
    echo 'Description=Jupyter Notebook' >> /lib/systemd/system/jupyter.service
    echo 'After=hadoop-yarn-resourcemanager.service' >> /lib/systemd/system/jupyter.service
    echo '[Service]' >> /lib/systemd/system/jupyter.service
    echo 'Type=simple' >> /lib/systemd/system/jupyter.service
    echo 'User=root' >> /lib/systemd/system/jupyter.service
    echo 'Group=root' >> /lib/systemd/system/jupyter.service
    echo 'WorkingDirectory=/home/hail/' >> /lib/systemd/system/jupyter.service
    echo 'ExecStart=/home/anaconda2/bin/python /home/anaconda2/bin/jupyter notebook --allow-root' >> /lib/systemd/system/jupyter.service
    echo 'Restart=always' >> /lib/systemd/system/jupyter.service
    echo 'RestartSec=1' >> /lib/systemd/system/jupyter.service
    echo '[Install]' >> /lib/systemd/system/jupyter.service
    echo 'WantedBy=multi-user.target' >> /lib/systemd/system/jupyter.service


    # Add Jupyter service to autorun and start it.
    systemctl daemon-reload
    systemctl enable jupyter
    service jupyter start

    # Sleep for 30 seconds to allow Jupyter notebook server to start.
    sleep 30

fi

以前是否有人遇到此错误?通常情况下,Google云会告诉您错误日志的位置,但没有关于此消息的其他信息。

/var/log/google-dataproc-agent.0.log中的例外情况:

# FIRST:
WARNING: exception thrown while executing request
java.net.SocketTimeoutException: connect timed out
        at java.net.PlainSocketImpl.socketConnect(Native Method)
        at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
        at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
        at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
        at java.net.Socket.connect(Socket.java:589)
        at sun.security.ssl.SSLSocketImpl.connect(SSLSocketImpl.java:668)
        at sun.net.NetworkClient.doConnect(NetworkClient.java:175)
        at sun.net.www.http.HttpClient.openServer(HttpClient.java:463)
        at sun.net.www.http.HttpClient.openServer(HttpClient.java:558)
        at sun.net.www.protocol.https.HttpsClient.<init>(HttpsClient.java:264)
        at sun.net.www.protocol.https.HttpsClient.New(HttpsClient.java:367)
        at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.getNewHttpClient(AbstractDelegateHttpsURLConnection.java:191)
        at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1138)
        at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1032)
        at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:177)
        at sun.net.www.protocol.https.HttpsURLConnectionImpl.connect(HttpsURLConnectionImpl.java:153)
        at com.google.cloud.hadoop.services.repackaged.com.google.api.client.http.javanet.NetHttpRequest.execute(NetHttpRequest.java:93)
        at com.google.cloud.hadoop.services.repackaged.com.google.api.client.http.HttpRequest.execute(HttpRequest.java:965)
        at com.google.cloud.hadoop.services.repackaged.com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:419)
        at com.google.cloud.hadoop.services.repackaged.com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:352)
        at com.google.cloud.hadoop.services.repackaged.com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:469)
        at com.google.cloud.hadoop.services.repackaged.com.google.cloud.hadoop.gcsio.GoogleCloudStorageImpl.getObject(GoogleCloudStorageImpl.java:1727)
        at com.google.cloud.hadoop.services.repackaged.com.google.cloud.hadoop.gcsio.GoogleCloudStorageImpl.getItemInfo(GoogleCloudStorageImpl.java:1618)
        at com.google.cloud.hadoop.services.repackaged.com.google.cloud.hadoop.gcsio.GoogleCloudStorageImpl.open(GoogleCloudStorageImpl.java:550)
        at com.google.cloud.hadoop.services.repackaged.com.google.cloud.hadoop.gcsio.GoogleCloudStorageImpl.open(GoogleCloudStorageImpl.java:531)
        at com.google.cloud.hadoop.services.agent.ClusterPropertiesModule.provideClusterProperties(ClusterPropertiesModule.java:52)
        at com.google.cloud.hadoop.services.agent.ClusterPropertiesModule$$FastClassByGuice$$52db3948.invoke(<generated>)
        at com.google.cloud.hadoop.services.repackaged.com.google.inject.internal.ProviderMethod$FastClassProviderMethod.doProvision(ProviderMethod.java:264)
        at com.google.cloud.hadoop.services.repackaged.com.google.inject.internal.ProviderMethod$Factory.provision(ProviderMethod.java:401)
        at com.google.cloud.hadoop.services.repackaged.com.google.inject.internal.ProviderMethod$Factory.get(ProviderMethod.java:376)
        at com.google.cloud.hadoop.services.repackaged.com.google.inject.internal.ProviderToInternalFactoryAdapter$1.call(ProviderToInternalFactoryAdapter.java:46)
        at com.google.cloud.hadoop.services.repackaged.com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1092)
        at com.google.cloud.hadoop.services.repackaged.com.google.inject.internal.ProviderToInternalFactoryAdapter.get(ProviderToInternalFactoryAdapter.java:40)
        at com.google.cloud.hadoop.services.repackaged.com.google.inject.internal.SingletonScope$1.get(SingletonScope.java:194)
        at com.google.cloud.hadoop.services.repackaged.com.google.inject.internal.InternalFactoryToProviderAdapter.get(InternalFactoryToProviderAdapter.java:41)
        at com.google.cloud.hadoop.services.repackaged.com.google.inject.internal.SingleParameterInjector.inject(SingleParameterInjector.java:38)
        at com.google.cloud.hadoop.services.repackaged.com.google.inject.internal.SingleParameterInjector.getAll(SingleParameterInjector.java:62)
        at com.google.cloud.hadoop.services.repackaged.com.google.inject.internal.ProviderMethod$Factory.provision(ProviderMethod.java:402)
        at com.google.cloud.hadoop.services.repackaged.com.google.inject.internal.ProviderMethod$Factory.get(ProviderMethod.java:376)
        at com.google.cloud.hadoop.services.repackaged.com.google.inject.internal.ProviderToInternalFactoryAdapter$1.call(ProviderToInternalFactoryAdapter.java:46)
        at com.google.cloud.hadoop.services.repackaged.com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1092)
        at com.google.cloud.hadoop.services.repackaged.com.google.inject.internal.ProviderToInternalFactoryAdapter.get(ProviderToInternalFactoryAdapter.java:40)
        at com.google.cloud.hadoop.services.repackaged.com.google.inject.internal.SingletonScope$1.get(SingletonScope.java:194)
        at com.google.cloud.hadoop.services.repackaged.com.google.inject.internal.InternalFactoryToProviderAdapter.get(InternalFactoryToProviderAdapter.java:41)
        at com.google.cloud.hadoop.services.repackaged.com.google.inject.internal.InjectorImpl$2$1.call(InjectorImpl.java:1019)
        at com.google.cloud.hadoop.services.repackaged.com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1092)
        at com.google.cloud.hadoop.services.repackaged.com.google.inject.internal.InjectorImpl$2.get(InjectorImpl.java:1015)
        at com.google.cloud.hadoop.services.agent.ClusterPropertiesModule.provideFallbackProperties(ClusterPropertiesModule.java:89)
        at com.google.cloud.hadoop.services.agent.ClusterPropertiesModule$$FastClassByGuice$$52db3948.invoke(<generated>)
        at com.google.cloud.hadoop.services.repackaged.com.google.inject.internal.ProviderMethod$FastClassProviderMethod.doProvision(ProviderMethod.java:264)
        at com.google.cloud.hadoop.services.repackaged.com.google.inject.internal.ProviderMethod$Factory.provision(ProviderMethod.java:401)
        at com.google.cloud.hadoop.services.repackaged.com.google.inject.internal.ProviderMethod$Factory.get(ProviderMethod.java:376)
        at com.google.cloud.hadoop.services.repackaged.com.google.inject.internal.ProviderToInternalFactoryAdapter$1.call(ProviderToInternalFactoryAdapter.java:46)
        at com.google.cloud.hadoop.services.repackaged.com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1092)
        at com.google.cloud.hadoop.services.repackaged.com.google.inject.internal.ProviderToInternalFactoryAdapter.get(ProviderToInternalFactoryAdapter.java:40)
        at com.google.cloud.hadoop.services.repackaged.com.google.inject.internal.SingletonScope$1.get(SingletonScope.java:194)
        at com.google.cloud.hadoop.services.repackaged.com.google.inject.internal.InternalFactoryToProviderAdapter.get(InternalFactoryToProviderAdapter.java:41)
        at com.google.cloud.hadoop.services.repackaged.com.google.inject.internal.SingleParameterInjector.inject(SingleParameterInjector.java:38)
        at com.google.cloud.hadoop.services.repackaged.com.google.inject.internal.SingleParameterInjector.getAll(SingleParameterInjector.java:62)
        at com.google.cloud.hadoop.services.repackaged.com.google.inject.internal.ConstructorInjector.provision(ConstructorInjector.java:110)
        at com.google.cloud.hadoop.services.repackaged.com.google.inject.internal.ConstructorInjector.construct(ConstructorInjector.java:90)
        at com.google.cloud.hadoop.services.repackaged.com.google.inject.internal.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:268)
        at com.google.cloud.hadoop.services.repackaged.com.google.inject.internal.InjectorImpl$2$1.call(InjectorImpl.java:1019)
        at com.google.cloud.hadoop.services.repackaged.com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1092)
        at com.google.cloud.hadoop.services.repackaged.com.google.inject.internal.InjectorImpl$2.get(InjectorImpl.java:1015)
        at com.google.cloud.hadoop.services.repackaged.com.google.inject.assistedinject.FactoryProvider2.invoke(FactoryProvider2.java:776)
        at com.google.cloud.hadoop.services.agent.protocol.$Proxy17.getAgentApiClientWithEndpoint(Unknown Source)
        at com.google.cloud.hadoop.services.agent.protocol.AgentServiceClientFactory.get(AgentServiceClientFactory.java:60)
        at com.google.cloud.hadoop.services.agent.protocol.AgentServiceClientFactory.get(AgentServiceClientFactory.java:20)
        at com.google.cloud.hadoop.services.repackaged.com.google.inject.internal.ProviderInternalFactory.provision(ProviderInternalFactory.java:81)
        at com.google.cloud.hadoop.services.repackaged.com.google.inject.internal.BoundProviderFactory.provision(BoundProviderFactory.java:72)
        at com.google.cloud.hadoop.services.repackaged.com.google.inject.internal.ProviderInternalFactory.circularGet(ProviderInternalFactory.java:61)
        at com.google.cloud.hadoop.services.repackaged.com.google.inject.internal.BoundProviderFactory.get(BoundProviderFactory.java:62)
        at com.google.cloud.hadoop.services.repackaged.com.google.inject.internal.ProviderToInternalFactoryAdapter$1.call(ProviderToInternalFactoryAdapter.java:46)
        at com.google.cloud.hadoop.services.repackaged.com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1092)
        at com.google.cloud.hadoop.services.repackaged.com.google.inject.internal.ProviderToInternalFactoryAdapter.get(ProviderToInternalFactoryAdapter.java:40)
        at com.google.cloud.hadoop.services.repackaged.com.google.inject.internal.SingletonScope$1.get(SingletonScope.java:194)
        at com.google.cloud.hadoop.services.repackaged.com.google.inject.internal.InternalFactoryToProviderAdapter.get(InternalFactoryToProviderAdapter.java:41)
        at com.google.cloud.hadoop.services.repackaged.com.google.inject.internal.SingleParameterInjector.inject(SingleParameterInjector.java:38)
        at com.google.cloud.hadoop.services.repackaged.com.google.inject.internal.SingleParameterInjector.getAll(SingleParameterInjector.java:62)
        at com.google.cloud.hadoop.services.repackaged.com.google.inject.internal.ConstructorInjector.provision(ConstructorInjector.java:110)
        at com.google.cloud.hadoop.services.repackaged.com.google.inject.internal.ConstructorInjector.construct(ConstructorInjector.java:90)
        at com.google.cloud.hadoop.services.repackaged.com.google.inject.internal.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:268)
        at com.google.cloud.hadoop.services.repackaged.com.google.inject.internal.ProviderToInternalFactoryAdapter$1.call(ProviderToInternalFactoryAdapter.java:46)
        at com.google.cloud.hadoop.services.repackaged.com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1092)
        at com.google.cloud.hadoop.services.repackaged.com.google.inject.internal.ProviderToInternalFactoryAdapter.get(ProviderToInternalFactoryAdapter.java:40)
        at com.google.cloud.hadoop.services.repackaged.com.google.inject.internal.SingletonScope$1.get(SingletonScope.java:194)
        at com.google.cloud.hadoop.services.repackaged.com.google.inject.internal.InternalFactoryToProviderAdapter.get(InternalFactoryToProviderAdapter.java:41)
        at com.google.cloud.hadoop.services.repackaged.com.google.inject.internal.InjectorImpl$2$1.call(InjectorImpl.java:1019)
        at com.google.cloud.hadoop.services.repackaged.com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1085)
        at com.google.cloud.hadoop.services.repackaged.com.google.inject.internal.InjectorImpl$2.get(InjectorImpl.java:1015)
        at com.google.cloud.hadoop.services.agent.AgentMain.boot(AgentMain.java:58)
        at com.google.cloud.hadoop.services.agent.AgentMain.main(AgentMain.java:46)



# SECOND
java.io.IOException: Cannot run program "/etc/google-dataproc/startup-scripts/dataproc-initialization-script-0": error=2, No such file or directory
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
        at com.google.cloud.hadoop.services.agent.util.NativeAsyncProcessWrapperFactory.startAndWrap(NativeAsyncProcessWrapperFactory.java:33)
        at com.google.cloud.hadoop.services.agent.util.NativeAsyncProcessWrapperFactory.startAndWrap(NativeAsyncProcessWrapperFactory.java:27)
        at com.google.cloud.hadoop.services.agent.BootstrapActionRunner.runScriptAndPipeOutputToGcs(BootstrapActionRunner.java:252)
        at com.google.cloud.hadoop.services.agent.BootstrapActionRunner.runSingleCustomInitializationScriptWithTimeout(BootstrapActionRunner.java:127)
        at com.google.cloud.hadoop.services.agent.BootstrapActionRunner.runCustomInitializationActions(BootstrapActionRunner.java:114)
        at com.google.cloud.hadoop.services.agent.AbstractAgentRunner.runCustomInitializationActionsIfFirstRun(AbstractAgentRunner.java:150)
        at com.google.cloud.hadoop.services.agent.MasterAgentRunner.initialize(MasterAgentRunner.java:165)
        at com.google.cloud.hadoop.services.agent.AbstractAgentRunner.start(AbstractAgentRunner.java:68)
        at com.google.cloud.hadoop.services.agent.MasterAgentRunner.start(MasterAgentRunner.java:36)
        at com.google.cloud.hadoop.services.agent.AgentMain.lambda$boot$0(AgentMain.java:63)
        at com.google.cloud.hadoop.services.agent.AgentStatusReporter.runWith(AgentStatusReporter.java:52)
        at com.google.cloud.hadoop.services.agent.AgentMain.boot(AgentMain.java:59)
        at com.google.cloud.hadoop.services.agent.AgentMain.main(AgentMain.java:46)
Caused by: java.io.IOException: error=2, No such file or directory
        at java.lang.UNIXProcess.forkAndExec(Native Method)
        at java.lang.UNIXProcess.<init>(UNIXProcess.java:247)
        at java.lang.ProcessImpl.start(ProcessImpl.java:134)
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
        ... 13 more

0 个答案:

没有答案