尝试通过运行Google云数据科学手册中的示例来测试Google数据流。代码可以在这里找到:https://github.com/GoogleCloudPlatform/data-science-on-gcp/blob/master/04_streaming/simulate/df06.py
以下是提交内容:
(venv) ➜ simulate git:(master) ✗ python df06.py -p gcp-datascience-book -b ds-book-admix -d airports
Correcting timestamps and writing to BigQuery dataset airports
/Users/Yoda/Documents/google_datascience/data-science-on-gcp/venv/lib/python2.7/site-packages/apache_beam/io/gcp/gcsio.py:166: DeprecationWarning: object() takes no parameters
super(GcsIO, cls).__new__(cls, storage_client))
running sdist
running egg_info
writing requirements to flightsdf.egg-info/requires.txt
writing flightsdf.egg-info/PKG-INFO
writing top-level names to flightsdf.egg-info/top_level.txt
writing dependency_links to flightsdf.egg-info/dependency_links.txt
reading manifest file 'flightsdf.egg-info/SOURCES.txt'
writing manifest file 'flightsdf.egg-info/SOURCES.txt'
warning: sdist: standard file not found: should have one of README, README.rst, README.txt, README.md
running check
warning: check: missing required meta-data: url
warning: check: missing meta-data: either (author and author_email) or (maintainer and maintainer_email) must be supplied
creating flightsdf-0.0.1
creating flightsdf-0.0.1/flightsdf.egg-info
copying files to flightsdf-0.0.1...
copying df06.py -> flightsdf-0.0.1
copying setup.py -> flightsdf-0.0.1
copying flightsdf.egg-info/PKG-INFO -> flightsdf-0.0.1/flightsdf.egg-info
copying flightsdf.egg-info/SOURCES.txt -> flightsdf-0.0.1/flightsdf.egg-info
copying flightsdf.egg-info/dependency_links.txt -> flightsdf-0.0.1/flightsdf.egg-info
copying flightsdf.egg-info/requires.txt -> flightsdf-0.0.1/flightsdf.egg-info
copying flightsdf.egg-info/top_level.txt -> flightsdf-0.0.1/flightsdf.egg-info
Writing flightsdf-0.0.1/setup.cfg
Creating tar archive
removing 'flightsdf-0.0.1' (and everything under it)
Collecting google-cloud-dataflow==2.4.0
Successfully downloaded google-cloud-dataflow
不幸的是,这个过程已经持续了好几个小时。在数据流控制台中,它告诉我作业不存在,状态显示"未启动"。
我试图取消这项工作是徒劳的:
(venv) ➜ simulate git:(master) ✗ gcloud beta dataflow jobs --project=gcp-datascience-book cancel 2018-06-08_12_31_28-xxxxxxxxxxxxxxxxx
Failed to cancel job [2018-06-08_12_31_28-xxxxxxxxxxxxxxxxx]: (82e778296697bc7f): Workflow modification failed. Causes: (8ad1157dde9a5d43): Operation cancel not allowed for job 2018-06-08_12_31_28-5106434125712000794. Job is not yet ready for canceling. Please retry in a few minutes.
我不确定问题是什么,因为我使用的是本书附带的克隆回购邮件中的代码。
提前感谢您的帮助!
答案 0 :(得分:0)
一个想法:用你的gcloud控制台的bash杀死你的.py怎么样? 如果它在后台,则按Ctrl + C和/或pkill -f ./df06.py