我试图通过气流(在Kubernetes内部安装)将表交叉点的结果写入csv文件,但是我收到消息,说csv文件是读取文件。我可以在脚本中修改任何参数以使其写入结果吗?
def update_table():
# Connecting to BigQuery
client = bigquery.Client()
query = """SELECT ltrim(rtrim(col1)) as col1,
sum(col2) as col2
FROM dataset.table
GROUP BY 1
ORDER BY col1 desc """
job = client.query(query)
df_tst = job.to_dataframe()
# Connecting to BigQuery
query_mer = """SELECT distinct col1 FROM dataset.table2 """
mer_job = client.query(query_mer)
df_mer = mer_job.to_dataframe()
# Comparing both tables
nomes = df_tst.col1.tolist()
#categorizacao_merchants
nomes_mer = df_merchants.col1.tolist()
lista = list(np.setdiff1d(nomes, nomes_mer))
for x in lista:
with open('/airflow/dags/git/test.csv','a', newline='') as f:
writer = csv.writer(f, delimiter=';')
writer.writerow([x])
f.close()
with DAG('update_cat', default_args=default_args, description='Python DAG', schedule_interval='0 0 * * 0', start_date=airflow.utils.dates.days_ago(0), catchup=False) as dag:
python_task = PythonOperator(task_id='python_task', python_callable=update_table, dag=dag)
答案 0 :(得分:-1)
在任务期间写入本地文件非常不顺风。通常,任务应从源读取并写回与容器/任务本身无关的位置,并且同一任务的多次执行不应进一步更改源或目标数据。这是为了保留幂等原则:以相同的输入运行的同一任务应始终生成相同的输出,并且气流不支持将数据从一个任务直接传递到另一任务,因此您需要一些中间数据存储。