我正尝试将以下格式的日志文件上传到登台数据库:
56.33.95.60 7916545 abdou23 2018-10-14 00:00:00 GET/HTTP/1.0 200 214804 \{rtve.es}\ \{Mozilla/5.0 (*Linux*Android?4.2*AP-106 Build/*) applewebkit* (*khtml*like*gecko*)*Chrome/*Safari/* OPR/15*}\
244.176.163.127 3372323 venuto89 2014-10-04 13:00:00 POST/HTTP/1.0 200 307886 \{aboutads.info}\ \{Mozilla/5.0 (Mobile; *rv:47.0*)*Gecko*Firefox/47.0*}\
68.161.164.71 1872720 owens50 2019-08-05 11:00:00 POST/HTTP/1.0 202 363106 \{xbox.com}\ \{Mozilla/5.0 (*Linux*Android?4.2*Micromax A77 Build/*) applewebkit* (*khtml*like*gecko*) UCBrowser/9.7* U3/* Safari/*}\
5.84.12.253 1045005 parkison48 2019-01-28 15:00:00 POST/HTTP/1.0 200 365454 \{cnbc.com}\ \{Mozilla/5.0 (iPhone*CPU iPhone OS * like Mac OS X*) applewebkit* (*khtml*like*gecko*) Mobile/* NAVER(* 7.6.*)}\
我遇到的问题是以下
value too long for type character varying(100000)
CONTEXT: COPY logs, line 1, column log_string:
这是因为代码将一行中的所有行合并。
我尝试使用
COPY logs FROM STDIN WITH DELIMITER AS E'\n'
我也尝试过/ r和/ t。并将FILE格式更改为CSV。
输出错误是
COPY delimiter cannot be newline or carriage return
Python代码是
def load_logs():
conn = PostgresHook(postgres_conn_id=db).get_conn()
cur = conn.cursor()
SQL_STATEMENT = """
COPY logs FROM STDIN WITH DELIMITER AS E'\n'
"""
with open('logfile.csv', 'r') as f:
cur.copy_expert(SQL_STATEMENT, f)
conn.commit()
此外,值得注意的是,Python代码在Airflow.py中像Dag一样运行