我有一个如下列表:
a= [['LSJW26760ES050487,2016-04-29,00:40:1,3'],['LSJW26760ES050487,2016-04-29,00:40:1,2'],['LSJW26760ES050487,2016-04-29,00:45:1,3'],['LSJW26760ES050487,2016-04-29,00:40:1,4'],.....]
如何在pandas中将其作为DataFrame读取:
type(str) Data(date.time) Time(time.timedelta) flag(int)
0 LSJW26760ES050487,2016-04-29,00:40:1,3
1 LSJW26760ES050487,2016-04-29,00:40:1,2
2 LSJW26760ES050487,2016-04-29,00:45:1,3
4 LSJW26760ES050487,2016-04-29,00:40:1,4
答案 0 :(得分:1)
这是一个Python 3代码,使用np.genfromtxt
使用逗号分隔符创建数组:
import numpy as np
import pandas as pd
from io import BytesIO
a= [['LSJW26760ES050487,2016-04-29,00:40:1,3'],['LSJW26760ES050487,2016-04- 29,00:40:1,2']]
data = [np.genfromtxt(BytesIO(item[0].encode()), delimiter=',', dtype=str) for item in a]
d = pd.DataFrame(data, columns='type date time flag'.split())
d.date = pd.to_datetime(d.date)
d.time = pd.to_timedelta(d.time)
d.flag = pd.to_numeric(d.flag)
print(d)
输出:
type date time flag
0 LSJW26760ES050487 2016-04-29 00:40:01 3
1 LSJW26760ES050487 2016-04-29 00:40:01 2
答案 1 :(得分:0)
pandas.DataFrame()
可以从列表列表构建数据框。您需要做的唯一预处理步骤是将字符串转换为列表,您可以使用"string".split(",")
执行此操作。这是一个有效的例子:
>>> import pandas as pd
>>> a= [['LSJW26760ES050487,2016-04-29,00:40:1,3'],['LSJW26760ES050487,2016-04-29,00:40:1,2'],['LSJW26760ES050487,2016-04-29,00:45:1,3'],['LSJW26760ES050487,2016-04-29,00:40:1,4']]
>>>
>>>
>>> a = [i[0].split(",") for i in a]
>>> df = pd.DataFrame(a)
>>> df.head()
0 1 2 3
0 LSJW26760ES050487 2016-04-29 00:40:1 3
1 LSJW26760ES050487 2016-04-29 00:40:1 2
2 LSJW26760ES050487 2016-04-29 00:45:1 3
3 LSJW26760ES050487 2016-04-29 00:40:1 4
>>>
作为最后一步,您可以按如下方式添加列名称:
>>> df.columns = ["type","date", "time", "flag"]
>>> df.head()
type date time flag
0 LSJW26760ES050487 2016-04-29 00:40:1 3
1 LSJW26760ES050487 2016-04-29 00:40:1 2
2 LSJW26760ES050487 2016-04-29 00:45:1 3
3 LSJW26760ES050487 2016-04-29 00:40:1 4
>>>
答案 2 :(得分:0)
您需要先str.split
进行,
拆分,然后转换列:
import pandas as pd
a= [['LSJW26760ES050487,2016-04-29,00:40:1,3'],
['LSJW26760ES050487,2016-04-29,00:40:1,2'],
['LSJW26760ES050487,2016-04-29,00:45:1,3'],
['LSJW26760ES050487,2016-04-29,00:40:1,4']]
df = pd.DataFrame(a, columns=['col'])
df = df.col.str.split(',', expand=True)
df.columns = ['type','data','time','flag']
df['data'] = pd.to_datetime(df.data)
df['time'] = pd.to_timedelta(df.time)
df['flag'] = df.flag.astype(int)
print (df)
type data time flag
0 LSJW26760ES050487 2016-04-29 00:40:01 3
1 LSJW26760ES050487 2016-04-29 00:40:01 2
2 LSJW26760ES050487 2016-04-29 00:45:01 3
3 LSJW26760ES050487 2016-04-29 00:40:01 4
print (df.dtypes)
type object
data datetime64[ns]
time timedelta64[ns]
flag int32
dtype: object
另一种解决方案,如果数据不是NaN
:
import pandas as pd
a= [['LSJW26760ES050487,2016-04-29,00:40:1,3'],
['LSJW26760ES050487,2016-04-29,00:40:1,2'],
['LSJW26760ES050487,2016-04-29,00:45:1,3'],
['LSJW26760ES050487,2016-04-29,00:40:1,4']]
df = pd.DataFrame([x[0].split(',') for x in a], columns=['type', 'data', 'time', 'flag'])
df['data'] = pd.to_datetime(df.data)
df['time'] = pd.to_timedelta(df.time)
df['flag'] = df.flag.astype(int)
print (df)
type data time flag
0 LSJW26760ES050487 2016-04-29 00:40:01 3
1 LSJW26760ES050487 2016-04-29 00:40:01 2
2 LSJW26760ES050487 2016-04-29 00:45:01 3
3 LSJW26760ES050487 2016-04-29 00:40:01 4
print (df.dtypes)
type object
data datetime64[ns]
time timedelta64[ns]
flag int32
dtype: object