在AWS Lambda上运行Selenium

时间:2019-10-22 18:36:22

标签: python linux selenium docker aws-lambda

我知道这是一个常见问题,我检查了许多答案并尝试了所有方法,但仍然找不到解决方法。

我试图在AWS Lambda上使用Python 3.6运行Selenium,并使用Docker创建部署包。我对Docker执行了以下步骤:

sudo docker run -v $(pwd):/outputs --name linked_in -d amazonlinux:latest tail -f /dev/null
sudo docker exec -i -t linked_in /bin/bash /outputs/buildPack_py.sh

这是我的buildPack_py.sh文件的样子:


python_install (){

  wget https://www.python.org/ftp/python/3.6.0/Python-3.6.0.tar.xz
  tar xJf Python-3.6.0.tar.xz
  cd Python-3.6.0

  ./configure
  make -j 5
  make install -j 5
  export PATH=/usr/local/bin/:$PATH
  cd ..
  rm Python-3.6.0.tar.xz
  rm -rf Python-3.6.0
}

dev_install () {
  yum -y update
  yum -y upgrade
  yum install -y \
  wget \
  curl \
  apt-get \
  gcc \
  gcc-c++ \
  findutils \
  zlib-devel \
  zip \
  xz \
  tar \
  make \
  openssl-devel \
  unzip \
  atlas atlas-devel lapack-devel blas-devel
#  curl https://intoli.com/install-google-chrome.sh | bash
#  mv /usr/bin/google-chrome /usr/bin/google-chrome-stable
  wget https://chromedriver.storage.googleapis.com/2.37/chromedriver_linux64.zip
  unzip chromedriver_linux64.zip
  curl -SL https://github.com/adieuadieu/serverless-chrome/releases/download/v1.0.0-37/stable-headless-chromium-amazonlinux-2017-03.zip > headless-chromium.zip
    unzip headless-chromium.zip -d bin/
}

install_packages () {
  cd /home/
    rm -rf env
  pip3 install virtualenv
  python3 -m virtualenv env --python=python3
  source env/bin/activate
  pip install datetime
  pip install requests
  pip install math
  pip install lxml
  pip install selenium
  pip install beautifulsoup4
  deactivate
}


gather_pack () {
  # packing
  cd /home/
    source env/bin/activate

  rm -rf lambdapack5
  mkdir lambdapack5
  cd lambdapack5

  cp -R /home/env/lib/python3.6/site-packages/* .
  cp -R /home/env/lib64/python3.6/site-packages/* .
  cp /outputs/linkedinScraper.py /home/lambdapack5/
  cp /chromedriver /home/lambdapack5/chromedriver
  cp /bin/headless-chromium /home/lambdapack5/headless-chromium
    echo "original size $(du -sh /home/lambdapack5 | cut -f1)"

  # cleaning libs
  rm -rf external
  #    find . -type d -name "tests" -exec rm -rf {} +

  # cleaning
  find -name "*.so" ! -name "_imaging.cpython-36m-x86_64-linux-gnu.so" | xargs strip
  #    find -name "*.so.*" | xargs strip
  find . -name test -type d -print0|xargs -0 rm -rf --
    rm -r pip
  rm -r pip-*
    rm -r wheel
  rm -r wheel-*
    rm easy_install.py
  find . -name \*.pyc -delete
  # find . -name \*.txt -delete
  echo "stripped size $(du -sh /home/lambdapack5 | cut -f1)"

  # compressing
  zip -FS -r9 /outputs/linkedinApi.zip * > /dev/null
  echo "compressed size $(du -sh /outputs/linkedinApi.zip | cut -f1)"
}

main () {
  dev_install
  python_install
  install_packages
  gather_pack
}

main

我正在使用以下版本:

chromedriver :2.35 无服务器铬:1.0.0-37

将zip文件上传到Lambda后出现的错误是:

  

selenium.common.exceptions.WebDriverException:消息:未知错误:Chrome无法启动:异常退出

在浏览其他帖子时,我发现上述版本可以很好地协同工作。我也看到了提到Xvfb的情况,但是如果我使用无头浏览器,确实需要这么做。

这是Selenium代码的一部分:

    options = webdriver.ChromeOptions()
    options.add_argument('--headless')
    options.add_argument("--no-sandbox")
    options.add_argument('--disable-gpu')
    options.add_argument('--window-size=1280x1696')
    options.add_argument('--user-data-dir=/tmp/')
    options.add_argument('--hide-scrollbars')
    options.add_argument('--enable-logging')
    options.add_argument('--log-level=0')
    options.add_argument('--v=99')
    options.add_argument('--single-process')
    options.add_argument('--data-path=/tmp/')
    options.add_argument('--ignore-certificate-errors')
    options.add_argument('--homedir=/tmp/')
    options.add_argument('--disk-cache-dir=/tmp/')
    options.add_argument('user-agent=Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36')
    options.binary_location = "headless-chromium"
    driver = webdriver.Chrome(executable_path=os.path.abspath("chromedriver"),options=options)

非常感谢任何帮助!

0 个答案:

没有答案