我有一个不断更新数据框并将其保存到磁盘的脚本(覆盖旧的csv文件)。我发现如果在保存调用df.to_csv("df.csv")
处中断程序,则所有数据都会丢失,df.csv
为空,只包含列索引。
我可以暂时将数据保存到df.temp.csv
,然后替换df.csv
,从而解决方法。但是有没有一个pythonic,简短的方法来拯救" Atomary"并防止数据丢失?这是我在保存呼叫中断时收到的堆栈跟踪。
Traceback (most recent call last):
File "/opt/homebrew-cask/Caskroom/pycharm/2016.1.3/PyCharm.app/Contents/helpers/pydev/pydevd.py", line 1531, in <module>
globals = debugger.run(setup['file'], None, None, is_module)
File "/opt/homebrew-cask/Caskroom/pycharm/2016.1.3/PyCharm.app/Contents/helpers/pydev/pydevd.py", line 938, in run
pydev_imports.execfile(file, globals, locals) # execute the script
File "/Users/user/test.py", line 49, in <module>
d.to_csv("out.csv", index=False)
File "/usr/local/lib/python2.7/site-packages/pandas/core/frame.py", line 1344, in to_csv
formatter.save()
File "/usr/local/lib/python2.7/site-packages/pandas/formats/format.py", line 1551, in save
self._save()
File "/usr/local/lib/python2.7/site-packages/pandas/formats/format.py", line 1652, in _save
self._save_chunk(start_i, end_i)
File "/usr/local/lib/python2.7/site-packages/pandas/formats/format.py", line 1666, in _save_chunk
quoting=self.quoting)
File "/usr/local/lib/python2.7/site-packages/pandas/core/internals.py", line 1443, in to_native_types
return formatter.get_result_as_array()
File "/usr/local/lib/python2.7/site-packages/pandas/formats/format.py", line 2171, in get_result_as_array
formatted_values = format_values_with(float_format)
File "/usr/local/lib/python2.7/site-packages/pandas/formats/format.py", line 2157, in format_values_with
for val in values.ravel()[imask]])
File "/usr/local/lib/python2.7/site-packages/pandas/formats/format.py", line 2108, in base_formatter
return str(v) if notnull(v) else self.na_rep
File "/usr/local/lib/python2.7/site-packages/pandas/core/common.py", line 250, in notnull
res = isnull(obj)
File "/usr/local/lib/python2.7/site-packages/pandas/core/common.py", line 73, in isnull
def isnull(obj):
File "_pydevd_bundle/pydevd_cython.pyx", line 937, in _pydevd_bundle.pydevd_cython.ThreadTracer.__call__ (_pydevd_bundle/pydevd_cython.c:15522)
File "/opt/homebrew-cask/Caskroom/pycharm/2016.1.3/PyCharm.app/Contents/helpers/pydev/_pydev_bundle/pydev_is_thread_alive.py", line 14, in is_thread_alive
def is_thread_alive(t):
KeyboardInterrupt
答案 0 :(得分:1)
您可以做的最好的事情是实现一个信号处理程序(signal
模块),它等待终止程序直到最后一次写操作完成。
沿线的某些事情(伪代码):
import signal
import sys
import time
import pandas as pd
lock = threading.Lock()
def handler(signum, frame):
# ensure that latest data is written
sys.exit(1)
signal.signal(signal.SIGTERM, handler)
signal.signal(signal.SIGINT, handler)
while True:
# might exit any time.
pd.to_csv(...)
time.sleep(1)
答案 1 :(得分:1)
您可以创建一个上下文管理器来处理原子覆盖:
import os
import contextlib
@contextlib.contextmanager
def atomic_overwrite(filename):
temp = filename + '~'
with open(temp, "w") as f:
yield f
os.rename(temp, filename) # this will only happen if no exception was raised
Pandas to_csv
上的DataFrame
方法将接受文件对象而不是路径,因此您可以使用:
with atomic_overwrite("df.csv") as f:
df.to_csv(f)
我选择的临时文件名是请求的文件名,末尾有波浪号。您当然可以根据需要更改代码以使用其他内容。我也不确定该文件应该打开的模式,您可能需要"wb"
而不仅仅是"w"
。