我有一个名为df的pandas DataFrame。使用df.dtypes
我可以在屏幕上打印:
arrival_time object
departure_time object
drop_off_type int64
extra object
pickup_type int64
stop_headsign object
stop_id object
stop_sequence int64
trip_id object
dtype: object
我想 保存 这些信息,以便我可以将其与其他数据进行比较,在别处输入类型等等。我想将其保存到本地文件,在其他数据无法执行的程序中将其恢复到其他位置。但我无法弄清楚如何。显示各种转化的结果。
df.dtypes.to_dict()
{'arrival_time': dtype('O'),
'departure_time': dtype('O'),
'drop_off_type': dtype('int64'),
'extra': dtype('O'),
'pickup_type': dtype('int64'),
'stop_headsign': dtype('O'),
'stop_id': dtype('O'),
'stop_sequence': dtype('int64'),
'trip_id': dtype('O')}
----
df.dtypes.to_json()
'{"arrival_time":{"alignment":4,"byteorder":"|","descr":[["","|O"]],"flags":63,"isalignedstruct":false,"isnative":true,"kind":"O","name":"object","ndim":0,"num":17,"str":"|O"},"departure_time":{"alignment":4,"byteorder":"|","descr":[["","|O"]],"flags":63,"isalignedstruct":false,"isnative":true,"kind":"O","name":"object","ndim":0,"num":17,"str":"|O"},"drop_off_type":{"alignment":4,"byteorder":"=","descr":[["","<i8"]],"flags":0,"isalignedstruct":false,"isnative":true,"kind":"i","name":"int64","ndim":0,"num":9,"str":"<i8"},"extra":{"alignment":4,"byteorder":"|","descr":[["","|O"]],"flags":63,"isalignedstruct":false,"isnative":true,"kind":"O","name":"object","ndim":0,"num":17,"str":"|O"},"pickup_type":{"alignment":4,"byteorder":"=","descr":[["","<i8"]],"flags":0,"isalignedstruct":false,"isnative":true,"kind":"i","name":"int64","ndim":0,"num":9,"str":"<i8"},"stop_headsign":{"alignment":4,"byteorder":"|","descr":[["","|O"]],"flags":63,"isalignedstruct":false,"isnative":true,"kind":"O","name":"object","ndim":0,"num":17,"str":"|O"},"stop_id":{"alignment":4,"byteorder":"|","descr":[["","|O"]],"flags":63,"isalignedstruct":false,"isnative":true,"kind":"O","name":"object","ndim":0,"num":17,"str":"|O"},"stop_sequence":{"alignment":4,"byteorder":"=","descr":[["","<i8"]],"flags":0,"isalignedstruct":false,"isnative":true,"kind":"i","name":"int64","ndim":0,"num":9,"str":"<i8"},"trip_id":{"alignment":4,"byteorder":"|","descr":[["","|O"]],"flags":63,"isalignedstruct":false,"isnative":true,"kind":"O","name":"object","ndim":0,"num":17,"str":"|O"}}'
----
json.dumps( df.dtypes.to_dict() )
...
TypeError: dtype('O') is not JSON serializable
----
list(xdf.dtypes)
[dtype('O'),
dtype('O'),
dtype('int64'),
dtype('O'),
dtype('int64'),
dtype('O'),
dtype('O'),
dtype('int64'),
dtype('O')]
如何保存和导出/存档pandas DataFrame的dtype信息?
答案 0 :(得分:4)
pd.DataFrame.dtypes
会返回pd.Series
个对象。这意味着您可以像处理Pandas中的任何常规系列一样操纵它:
df = pd.DataFrame({'A': [''], 'B': [1.0], 'C': [1], 'D': [True]})
res = df.dtypes.to_frame('dtypes').reset_index()
print(res)
index dtypes
0 A object
1 B float64
2 C int64
3 D bool
输出到csv / excel / pickle
然后,您可以使用通常用于存储数据框的任何方法,例如to_csv
,to_excel
,to_pickle
等。分发pickle的注意事项不推荐,因为它取决于版本。
输出到json
如果您希望以字典轻松存储和加载,则常用格式为json
。如您所见,您需要先转换为str
类型:
import json
# first create dictionary
d = res.set_index('index')['dtypes'].astype(str).to_dict()
with open('types.json', 'w') as f:
json.dump(d, f)
with open('types.json', 'r') as f:
data_types = json.load(f)
print(data_types)
{'A': 'object', 'B': 'float64', 'C': 'int64', 'D': 'bool'}
答案 1 :(得分:0)
您可以使用pickle
格式。
# save
df.to_pickle(file_name)
# load
df = pandas.read_pickle(file_name)
答案 2 :(得分:0)
我发现自己将 dtype 信息放在了 CSV 文件的开头。在数据帧之前读出它是微不足道的,这使得它相当不错。
示例数据帧(从 @jpp's answer 无耻地复制):
df = pd.DataFrame({'A': [''], 'B': [1.0], 'C': [1], 'D': [True]})
为了保存,我会这样做:
with open('test.csv', 'wt') as f:
f.write(',' + ','.join(map(str, r.dtypes)) + '\n')
r.to_csv(f, line_terminator='\n')
我在这里为索引列添加了额外的逗号,因为我想写索引。一般来说,您不必这样做。
Reading 现在是 4 行而不是单行,但可以说更加精确。
with open('test.csv', 'rt') as f:
types = next(f).rstrip().split(',')[1:]
columns = next(f).rstrip().split(',')[1:]
test = pd.read_csv(f, dtype=dict(zip(columns, types)), index_col=0, names=columns)
我在对天文数据进行目录搜索时遇到了这个问题,其中许多文本字段丢失并被错误地加载为浮点 NaN。另一种方法是在 low_memory=False
上设置 read_csv
,但这会使其更加隐式而不是显式。