Python pandas数据帧多个日期时间转换

时间:2018-03-22 16:02:21

标签: python excel pandas

我正在使用python将值读入excel数据表。 我有一个包含日期值的列,有些日期值有多个值:

1441152000000.0
1441756800000
1476316800000,1482192000000,1440547200000,1453248000000,1460505600000
1465430400000
1476921600000 
1450224000000.0
1449014400000

我正在使用熊猫的to_datetime:

df.iloc["colname"] = pd.to_datetime(df.iloc["colname"], unit='ms', utc=True)

它崩溃了。我认为它是预期的,因为它是列表中的列表,to_datetime不知道如何处理它。 我试图修改以逐个单元格进行转换,但这也给了我一些错误:

for ii in range(0, len(df.axes[0])):
     jj=df.columns.get_loc(col)
     df.iloc[ii,jj] = pd.to_datetime(df.iloc[ii,jj], unit='ms', utc=True)

这给出:“引发ValueError('必须具有相等的len键和值' ValueError:使用可迭代的“

设置时,必须具有相等的len键和值

我不确定此时我还能尝试什么...

1 个答案:

答案 0 :(得分:1)

分两步完成:首先,将列转换为列的列(使用ast作为压缩单元格),然后重新创建数据帧。

然后你可以调用你的函数:

import ast

#convert packed cells to list
indexes = df[df['colname'].apply(lambda x: not (isinstance(x, int) or isinstance(x, float)) and "," in x)].index
df.loc[indexes, 'colname'] = df.loc[indexes, 'colname'].apply(lambda x:ast.literal_eval( "[" + x +"]"))

#convert unpacked cells to list
indexes = df[df['colname'].apply(lambda x: isinstance(x, int) or isinstance(x, float))].index
df.loc[indexes, "colname"] = df.loc[indexes, "colname"].apply(lambda x: [x,])

#Recreate dataframe
vals = [[unique, *vals] for colname, *vals in df.values.tolist() for unique in colname]
df = pd.DataFrame(vals, columns = df.columns.tolist())

#Secure data type
df['colname'] = df['colname'].astype(float)

#Apply your function
df["colname"] = pd.to_datetime(df["colname"], unit='ms', utc=True)

修改以匹配您的真实列:

import ast
for colname in ['Date', 'Created']:
    #convert packed cells to list
    indexes = df[df[colname].apply(lambda x: not (isinstance(x, int) or isinstance(x, float)) and "," in x)].index
    df.loc[indexes, colname] = df.loc[indexes, colname].apply(lambda x:ast.literal_eval( "[" + x +"]"))

    #convert unpacked cells to list
    indexes = df[df[colname].apply(lambda x: isinstance(x, int) or isinstance(x, float))].index
    df.loc[indexes, colname] = df.loc[indexes, colname].apply(lambda x: [x,])

vals = [[userid, visit, unique_date, uniquefilename, unique_created, *rest_of_datas] for userid, visit, date, uniquefilename, created, *rest_of_datas in df.values.tolist() for unique_date in date for unique_created in created]
df = pd.DataFrame(vals, columns = df.columns.tolist())
for colname in ['Date', 'Created']:
    df[colname] = pd.to_datetime(df[colname], unit='ms', utc=True)