我有以下python代码,我试图根据时间戳输出到目录。
import pandas as pd
import pyarrow as pa
import pyarrow.parquet as pq
import uuid
data = {'date': ['2018-03-04T14:12:15.653Z', '2018-03-03T14:12:15.653Z', '2018-03-02T14:12:15.653Z', '2018-03-05T14:12:15.653Z'],
'battles': [34, 25, 26, 57],
'citys': ['london', 'newyork', 'boston', 'boston']}
df = pd.DataFrame(data, columns=['date', 'battles', 'citys'])
df['date'] = df['date'].map(lambda t: pd.to_datetime(t, format="%Y-%m-%dT%H:%M:%S.%fZ"))
df.groupby(by=['citys'])
dst_path = "logs/year=" + df['date'].dt.year.astype('str').unique() + "/month=" + df['date'].dt.month.astype('str').unique() + "/day=" + df['date'].dt.day.astype('str').unique() + "/" + str(uuid.uuid4()) + ".parq"
table = pa.Table.from_pandas(df)
pq.write_table(table, dst_path)
但我看到以下错误:
python3 test.py
Traceback (most recent call last):
File "test.py", line 15, in <module>
pq.write_table(table, dst_path)
File "/usr/local/lib/python3.6/site-packages/pyarrow/parquet.py", line 943, in write_table
**kwargs)
File "/usr/local/lib/python3.6/site-packages/pyarrow/parquet.py", line 286, in __init__
**options)
File "pyarrow/_parquet.pyx", line 837, in pyarrow._parquet.ParquetWriter.__cinit__ (/Users/travis/build/BryanCutler/arrow-dist/arrow/python/build/temp.macosx-10.6-intel-3.6/_parquet.cxx:14606)
File "pyarrow/io.pxi", line 835, in pyarrow.lib.get_writer (/Users/travis/build/BryanCutler/arrow-dist/arrow/python/build/temp.macosx-10.6-intel-3.6/lib.cxx:59078)
TypeError: Unable to read from object of type: <class 'numpy.ndarray'>
如何从pandas timestamp创建目录?
答案 0 :(得分:1)
你的dst_path是一个numpy数组。
print(type(dst_path))
输出
<class 'numpy.ndarray'>
它应该是一个字符串,所以在dst_path行下面我添加了以下内容并且它有效。它不优雅,所以你可以调查一个更好的方法来做到这一点。这里的要点是你需要一个字符串。
dst_path = str(dst_path[0])
请注意,目录必须已经存在或者您将收到错误,因此您可以在write_table之前编写以下内容。
import os
dir, file = os.path.split(dst_path)
if not os.path.exists(dir):
os.makedirs(dir)