我正在使用python将值读入excel数据表。 我有一个包含日期值的列,有些日期值有多个值:
1441152000000.0
1441756800000
1476316800000,1482192000000,1440547200000,1453248000000,1460505600000
1465430400000
1476921600000
1450224000000.0
1449014400000
我正在使用熊猫的to_datetime:
df.iloc["colname"] = pd.to_datetime(df.iloc["colname"], unit='ms', utc=True)
它崩溃了。我认为它是预期的,因为它是列表中的列表,to_datetime不知道如何处理它。 我试图修改以逐个单元格进行转换,但这也给了我一些错误:
for ii in range(0, len(df.axes[0])):
jj=df.columns.get_loc(col)
df.iloc[ii,jj] = pd.to_datetime(df.iloc[ii,jj], unit='ms', utc=True)
这给出:“引发ValueError('必须具有相等的len键和值' ValueError:使用可迭代的“
设置时,必须具有相等的len键和值我不确定此时我还能尝试什么...
答案 0 :(得分:1)
分两步完成:首先,将列转换为列的列(使用ast作为压缩单元格),然后重新创建数据帧。
然后你可以调用你的函数:
import ast
#convert packed cells to list
indexes = df[df['colname'].apply(lambda x: not (isinstance(x, int) or isinstance(x, float)) and "," in x)].index
df.loc[indexes, 'colname'] = df.loc[indexes, 'colname'].apply(lambda x:ast.literal_eval( "[" + x +"]"))
#convert unpacked cells to list
indexes = df[df['colname'].apply(lambda x: isinstance(x, int) or isinstance(x, float))].index
df.loc[indexes, "colname"] = df.loc[indexes, "colname"].apply(lambda x: [x,])
#Recreate dataframe
vals = [[unique, *vals] for colname, *vals in df.values.tolist() for unique in colname]
df = pd.DataFrame(vals, columns = df.columns.tolist())
#Secure data type
df['colname'] = df['colname'].astype(float)
#Apply your function
df["colname"] = pd.to_datetime(df["colname"], unit='ms', utc=True)
修改以匹配您的真实列:
import ast
for colname in ['Date', 'Created']:
#convert packed cells to list
indexes = df[df[colname].apply(lambda x: not (isinstance(x, int) or isinstance(x, float)) and "," in x)].index
df.loc[indexes, colname] = df.loc[indexes, colname].apply(lambda x:ast.literal_eval( "[" + x +"]"))
#convert unpacked cells to list
indexes = df[df[colname].apply(lambda x: isinstance(x, int) or isinstance(x, float))].index
df.loc[indexes, colname] = df.loc[indexes, colname].apply(lambda x: [x,])
vals = [[userid, visit, unique_date, uniquefilename, unique_created, *rest_of_datas] for userid, visit, date, uniquefilename, created, *rest_of_datas in df.values.tolist() for unique_date in date for unique_created in created]
df = pd.DataFrame(vals, columns = df.columns.tolist())
for colname in ['Date', 'Created']:
df[colname] = pd.to_datetime(df[colname], unit='ms', utc=True)