How to read csv with timedeltas and NaN?

时间:2018-05-28 18:54:41

标签: python pandas csv timedelta

I'm trying to read a csv file that looks like this:

              col 1             col 2             col 3      ...     col N
0        0 days 00:00:16   0 days 00:00:07   0 days 00:01:02          NaN
.
.
.
15000    0 days 01:40:00         NaN               NaN       ...      NaN

What I've tried:

df = pd.read_csv('file.csv', sep=',', index_col=0, dtype=object)
df = df.applymap(lambda x: pd.to_timedelta(x))

but as I have a lot of columns and rows, it is somewhat slow. Is there a more proper way to do this?

1 个答案:

答案 0 :(得分:3)

parse_dates中的dtyperead_csv参数不支持

timedelta对象。这里有几个选择。

apply + to_timedelta

df = df.apply(pd.to_timedelta, errors='coerce')

或者,

for c in df.columns:
    df[c] = pd.to_timedelta(df[c], errors='coerce')

pd.read_csv(..., converters=...)

另一种选择是在加载时传递converters参数:

f = {i : pd.to_timedelta for i in range(N)}  # you can access columns by index
df = pd.read_csv('file.csv', sep=',', index_col=0, converters=f)