我正在将apache beam用于python(python版本2.7),当我将代码上传到Google App Engine Flexible时,总是收到错误:ImportError: No module named main
。当我调用端点/server
时,可以在数据流控制台中看到此错误。
当我在本地执行代码时,它可以在我的gcloud数据流中完美运行,但是当我在GAE Flex中执行代码时,出现上面指定的错误。
这是我的代码:
import apache_beam as beam
import logging
logging.basicConfig(level=logging.DEBUG)
from flask import Flask
from apache_beam.options.pipeline_options import PipelineOptions
from apache_beam.options.pipeline_options import StandardOptions, SetupOptions
from apache_beam.options.pipeline_options import GoogleCloudOptions
from apache_beam.io import WriteToText
from apache_beam.io import ReadFromText
PROJECT_ID = 'PROJECT_ID'
JOB_NAME = 'test-job-name-l'
BUCKET_URL = 'gs://backup-bucket'
app = Flask(__name__)
@app.route('/')
def start():
return "Welcome to datamigration"
@app.route('/server')
def start1():
run()
return "It works"
class FindWords(beam.DoFn):
def process(self, element):
import re as regex
return regex.findall(r"[A-Za-z\']+", element)
class CountWordsTransform(beam.PTransform):
def expand(self, p_collection):
return (p_collection
| "Split" >> (beam.ParDo(FindWords()).with_input_types(unicode))
| "PairWithOne" >> beam.Map(lambda word: (word, 1))
| "GroupBy" >> beam.GroupByKey()
| "AggregateGroups" >> beam.Map(lambda (word, ones): (word, sum(ones))))
def run():
pipeline_options = PipelineOptions()
pipeline_options.view_as(SetupOptions).save_main_session = True
pipeline_options.view_as(
SetupOptions).requirements_file = "requirements.txt"
google_cloud_options = pipeline_options.view_as(GoogleCloudOptions)
google_cloud_options.project = PROJECT_ID
google_cloud_options.job_name = JOB_NAME
google_cloud_options.staging_location = BUCKET_URL + '/staging'
google_cloud_options.temp_location = BUCKET_URL + '/temp'
pipeline_options.view_as(StandardOptions).runner = 'DataflowRunner'
pipeline = beam.Pipeline(options=pipeline_options)
(pipeline
| "Load" >> ReadFromText(BUCKET_URL + "/file.txt")
| "Count Words" >> CountWordsTransform()
| "Save" >> WriteToText(BUCKET_URL + '/result/test')
)
pipeline.run()
if __name__ == '__main__':
app.run(port=8080, debug=True)
这是我总是得到的完整错误:
Error:
Dataflow pipeline failed. State: FAILED, Error:
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 642, in do_work
work_executor.execute()
File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/executor.py", line 156, in execute
op.start()
File "apache_beam/runners/worker/operations.py", line 351, in apache_beam.runners.worker.operations.DoOperation.start
def start(self):
File "apache_beam/runners/worker/operations.py", line 352, in apache_beam.runners.worker.operations.DoOperation.start
with self.scoped_start_state:
File "apache_beam/runners/worker/operations.py", line 357, in apache_beam.runners.worker.operations.DoOperation.start
pickler.loads(self.spec.serialized_fn))
File "/usr/local/lib/python2.7/dist-packages/apache_beam/internal/pickler.py", line 232, in loads
return dill.loads(s)
File "/usr/local/lib/python2.7/dist-packages/dill/dill.py", line 277, in loads
return load(file)
File "/usr/local/lib/python2.7/dist-packages/dill/dill.py", line 266, in load
obj = pik.load()
File "/usr/lib/python2.7/pickle.py", line 864, in load
dispatch[key](self)
File "/usr/lib/python2.7/pickle.py", line 1096, in load_global
klass = self.find_class(module, name)
File "/usr/local/lib/python2.7/dist-packages/dill/dill.py", line 423, in find_class
return StockUnpickler.find_class(self, module, name)
File "/usr/lib/python2.7/pickle.py", line 1130, in find_class
__import__(module)
ImportError: No module named main
我的app.yaml:
runtime: python
env: flex
service: ms-somename
threadsafe: true
entrypoint: gunicorn -b :$PORT main:app
runtime_config:
python_version: 2
manual_scaling:
instances: 1
resources:
cpu: 1
memory_gb: 0.5
disk_size_gb: 10
还有我的要求。txt
google-cloud-datastore==1.3.0
google-cloud-dataflow==2.5.0
google-apitools==0.5.16
googledatastore==7.0.1
apache-beam==2.5.0
apache-beam[gcp]==2.5.0
Flask==0.12.2
gunicorn==19.9.0
答案 0 :(得分:1)
我实际上在几天前就遇到了这个问题,但是我试图在GAE Standard python 3.7上运行它。话虽如此,我已通过在我的requirements.txt文件中包含gunicorn解决了我的问题。最初我不是因为我错过了文档中的这一行:
Do not include gunicorn in your requirements.txt file unless you are specifying the entrypoint.
https://cloud.google.com/appengine/docs/standard/python3/runtime
同样是针对GAE标准。
答案 1 :(得分:0)
您的代码是否在main.py
文件中?
在您的yaml文件中,
entrypoint: gunicorn -b :$PORT main:app
告诉金枪鱼在app
模块中查找main
变量(更多信息here)。如果您没有main.py
,它将引发错误。