Question

我正在将apache beam用于python（python版本2.7），当我将代码上传到Google App Engine Flexible时，总是收到错误：ImportError: No module named main。当我调用端点/server时，可以在数据流控制台中看到此错误。当我在本地执行代码时，它可以在我的gcloud数据流中完美运行，但是当我在GAE Flex中执行代码时，出现上面指定的错误。

这是我的代码：

import apache_beam as beam
import logging
logging.basicConfig(level=logging.DEBUG)

from flask import Flask

from apache_beam.options.pipeline_options import PipelineOptions
from apache_beam.options.pipeline_options import StandardOptions, SetupOptions
from apache_beam.options.pipeline_options import GoogleCloudOptions

from apache_beam.io import WriteToText
from apache_beam.io import ReadFromText


PROJECT_ID = 'PROJECT_ID'
JOB_NAME = 'test-job-name-l'
BUCKET_URL = 'gs://backup-bucket'
app = Flask(__name__)

@app.route('/')
def start():
    return "Welcome to datamigration"

@app.route('/server')
def start1():
    run()
    return "It works"

class FindWords(beam.DoFn):
    def process(self, element):
        import re as regex
        return regex.findall(r"[A-Za-z\']+", element)

class CountWordsTransform(beam.PTransform):
    def expand(self, p_collection):
        return (p_collection
                | "Split" >> (beam.ParDo(FindWords()).with_input_types(unicode))
                | "PairWithOne" >> beam.Map(lambda word: (word, 1))
                | "GroupBy" >> beam.GroupByKey()
                | "AggregateGroups" >> beam.Map(lambda (word, ones): (word, sum(ones))))

def run():
    pipeline_options = PipelineOptions()
    pipeline_options.view_as(SetupOptions).save_main_session = True
    pipeline_options.view_as(
        SetupOptions).requirements_file = "requirements.txt"
    google_cloud_options = pipeline_options.view_as(GoogleCloudOptions)
    google_cloud_options.project = PROJECT_ID
    google_cloud_options.job_name = JOB_NAME
    google_cloud_options.staging_location = BUCKET_URL + '/staging'
    google_cloud_options.temp_location = BUCKET_URL + '/temp'
    pipeline_options.view_as(StandardOptions).runner = 'DataflowRunner'
    pipeline = beam.Pipeline(options=pipeline_options)

    (pipeline
     | "Load" >> ReadFromText(BUCKET_URL + "/file.txt")
     | "Count Words" >> CountWordsTransform()
     | "Save" >> WriteToText(BUCKET_URL + '/result/test')
     )

    pipeline.run()

if __name__ == '__main__':
    app.run(port=8080, debug=True)

这是我总是得到的完整错误：

Error:
Dataflow pipeline failed. State: FAILED, Error:
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 642, in do_work
    work_executor.execute()
  File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/executor.py", line 156, in execute
    op.start()
  File "apache_beam/runners/worker/operations.py", line 351, in apache_beam.runners.worker.operations.DoOperation.start
    def start(self):
  File "apache_beam/runners/worker/operations.py", line 352, in apache_beam.runners.worker.operations.DoOperation.start
    with self.scoped_start_state:
  File "apache_beam/runners/worker/operations.py", line 357, in apache_beam.runners.worker.operations.DoOperation.start
    pickler.loads(self.spec.serialized_fn))
  File "/usr/local/lib/python2.7/dist-packages/apache_beam/internal/pickler.py", line 232, in loads
    return dill.loads(s)
  File "/usr/local/lib/python2.7/dist-packages/dill/dill.py", line 277, in loads
    return load(file)
  File "/usr/local/lib/python2.7/dist-packages/dill/dill.py", line 266, in load
    obj = pik.load()
  File "/usr/lib/python2.7/pickle.py", line 864, in load
    dispatch[key](self)
  File "/usr/lib/python2.7/pickle.py", line 1096, in load_global
    klass = self.find_class(module, name)
  File "/usr/local/lib/python2.7/dist-packages/dill/dill.py", line 423, in find_class
    return StockUnpickler.find_class(self, module, name)
  File "/usr/lib/python2.7/pickle.py", line 1130, in find_class
    __import__(module)
ImportError: No module named main

我的app.yaml：

runtime: python
env: flex
service: ms-somename
threadsafe: true

entrypoint: gunicorn -b :$PORT main:app

runtime_config:
  python_version: 2

manual_scaling:
  instances: 1

resources:
  cpu: 1
  memory_gb: 0.5
  disk_size_gb: 10

还有我的要求。txt

google-cloud-datastore==1.3.0
google-cloud-dataflow==2.5.0
google-apitools==0.5.16
googledatastore==7.0.1
apache-beam==2.5.0
apache-beam[gcp]==2.5.0
Flask==0.12.2
gunicorn==19.9.0

Answer 1

我实际上在几天前就遇到了这个问题，但是我试图在GAE Standard python 3.7上运行它。话虽如此，我已通过在我的requirements.txt文件中包含gunicorn解决了我的问题。最初我不是因为我错过了文档中的这一行：

Do not include gunicorn in your requirements.txt file unless you are specifying the entrypoint.

https://cloud.google.com/appengine/docs/standard/python3/runtime

同样是针对GAE标准。

Answer 2

您的代码是否在main.py文件中？

在您的yaml文件中，

entrypoint: gunicorn -b :$PORT main:app

告诉金枪鱼在app模块中查找main变量（更多信息here）。如果您没有main.py，它将引发错误。

ImportError：GAE Flexible中没有名为main的模块

2 个答案: