气流DAG保持重试而未显示任何错误

时间:2018-11-05 21:15:02

标签: google-cloud-platform airflow google-cloud-composer

我使用Google作曲家。我有一个使用panda.read_csv()函数读取.csv.gz文件的dag。 DAG会继续尝试而不会显示任何错误。这是气流记录:

 *** Reading remote log from gs://us-central1-data-airflo-dxxxxx-bucket/logs/youtubetv_gcpbucket_to_bq_daily_v2_csv/file_transfer_gcp_to_bq/2018-11-04T20:00:00/1.log.
[2018-11-05 21:03:58,123] {cli.py:374} INFO - Running on host airflow-worker-77846bb966-vgrbz
[2018-11-05 21:03:58,239] {models.py:1196} INFO - Dependencies all met for <TaskInstance: youtubetv_gcpbucket_to_bq_daily_v2_csv.file_transfer_gcp_to_bq 2018-11-04 20:00:00 [queued]>
[2018-11-05 21:03:58,297] {models.py:1196} INFO - Dependencies all met for <TaskInstance: youtubetv_gcpbucket_to_bq_daily_v2_csv.file_transfer_gcp_to_bq 2018-11-04 20:00:00 [queued]>
[2018-11-05 21:03:58,298] {models.py:1406} INFO -
---------------------------------------------------------------------- 
---------
Starting attempt 1 of 
---------------------------------------------------------------------- 
---------

[2018-11-05 21:03:58,337] {models.py:1427} INFO - Executing <Task(BranchPythonOperator): file_transfer_gcp_to_bq> on 2018-11-04 20:00:00
[2018-11-05 21:03:58,338] {base_task_runner.py:115} INFO - Running: ['bash', '-c', u'airflow run youtubetv_gcpbucket_to_bq_daily_v2_csv file_transfer_gcp_to_bq 2018-11-04T20:00:00 --job_id 15096 --raw -sd DAGS_FOLDER/dags/testdags/youtubetv_gcp_to_bq_v2.py']
DAG中的

python代码:

from datetime import datetime,timedelta
from airflow import DAG
from airflow import models
import os
import io,logging, sys
import pandas as pd
from io import BytesIO, StringIO

from airflow.operators.dummy_operator import DummyOperator
from airflow.operators.subdag_operator import SubDagOperator
from airflow.operators.python_operator import BranchPythonOperator
from airflow.operators.bash_operator import BashOperator

#GCP
from google.cloud import storage
import google.cloud
from google.cloud import bigquery
from google.oauth2 import service_account

from airflow.operators.slack_operator import SlackAPIPostOperator
from airflow.models import Connection
from airflow.utils.db import provide_session
from airflow.utils.trigger_rule import TriggerRule

def readCSV(checked_date,file_name, **kwargs): 
    subDir=checked_date.replace('-','/')
    fileobj = get_byte_fileobj(BQ_PROJECT_NAME, YOUTUBETV_BUCKET, subDir+"/"+file_name)
    df_chunks = pd.read_csv(fileobj, compression='gzip',memory_map=True, chunksize=1000000) # return TextFileReader
    print ("done reaCSV")
    return df_chunks

DAG:

    file_transfer_gcp_to_bq = BranchPythonOperator(
    task_id='file_transfer_gcp_to_bq',
    provide_context=True,
    python_callable=readCSV,
    op_kwargs={'checked_date': '2018-11-03', 'file_name':'daily_events_xxxxx_partner_report.csv.gz'}
    )

DAG已成功在我的本地气流版本上运行。

def readCSV(checked_date,file_name, **kwargs): 
   subDir=checked_date.replace('-','/')
   fileobj = get_byte_fileobj(BQ_PROJECT_NAME, YOUTUBETV_BUCKET, subDir+"/"+file_name)
   df = pd.read_csv(fileobj, compression='gzip',memory_map=True)
   return df

测试了get_byte_fileobj,它可以作为独立功能使用。

2 个答案:

答案 0 :(得分:1)

基于此讨论airflow google composer group,这是一个已知问题。 原因之一可能是由于过度破坏了所有作曲家的资源(在我的情况下是内存)

答案 1 :(得分:0)

我最近有一个similar issue

在我看来,这是因为kubernetes工作者超载。

您还可以在kubernetes仪表板上观看工作人员的表现,也可以查看您的案例是否是集群超载问题。

如果是,您可以尝试将气流配置celeryd_concurrency的值设置得较低,以减少工作人员的视差,并查看集群负载是否下降

enter image description here