Kubernetes Executor的气流-错误:只读文件系统:'/airflow/dags/git/test.csv

时间:2019-10-15 13:42:42

标签: python kubernetes airflow

我试图通过气流(在Kubernetes内部安装)将表交叉点的结果写入csv文件,但是我收到消息,说csv文件是读取文件。我可以在脚本中修改任何参数以使其写入结果吗?

def update_table():
    # Connecting to BigQuery
    client = bigquery.Client()
    query = """SELECT ltrim(rtrim(col1)) as col1,
                 sum(col2) as col2
                 FROM dataset.table
                 GROUP BY 1
                 ORDER BY col1 desc """
    job = client.query(query)
    df_tst = job.to_dataframe()

    # Connecting to BigQuery 
    query_mer = """SELECT distinct col1 FROM dataset.table2 """
    mer_job = client.query(query_mer)
    df_mer = mer_job.to_dataframe()

    # Comparing both tables

    nomes = df_tst.col1.tolist()
    #categorizacao_merchants
    nomes_mer = df_merchants.col1.tolist()
    lista = list(np.setdiff1d(nomes, nomes_mer))

    for x in lista:
        with open('/airflow/dags/git/test.csv','a', newline='') as f:
            writer = csv.writer(f, delimiter=';')
            writer.writerow([x])
            f.close()

with DAG('update_cat', default_args=default_args, description='Python DAG', schedule_interval='0 0 * * 0', start_date=airflow.utils.dates.days_ago(0), catchup=False) as dag:
        python_task = PythonOperator(task_id='python_task', python_callable=update_table, dag=dag)

1 个答案:

答案 0 :(得分:-1)

在任务期间写入本地文件非常不顺风。通常,任务应从源读取并写回与容器/任务本身无关的位置,并且同一任务的多次执行不应进一步更改源或目标数据。这是为了保留幂等原则:以相同的输入运行的同一任务应始终生成相同的输出,并且气流不支持将数据从一个任务直接传递到另一任务,因此您需要一些中间数据存储。