如何将字符串转移到DataFrame

时间:2016-08-15 08:57:45

标签: python pandas dataframe

我有一个如下列表:

a= [['LSJW26760ES050487,2016-04-29,00:40:1,3'],['LSJW26760ES050487,2016-04-29,00:40:1,2'],['LSJW26760ES050487,2016-04-29,00:45:1,3'],['LSJW26760ES050487,2016-04-29,00:40:1,4'],.....]

如何在pandas中将其作为DataFrame读取:

    type(str)  Data(date.time) Time(time.timedelta) flag(int)
0   LSJW26760ES050487,2016-04-29,00:40:1,3
1   LSJW26760ES050487,2016-04-29,00:40:1,2
2   LSJW26760ES050487,2016-04-29,00:45:1,3
4   LSJW26760ES050487,2016-04-29,00:40:1,4

3 个答案:

答案 0 :(得分:1)

这是一个Python 3代码,使用np.genfromtxt使用逗号分隔符创建数组:

import numpy as np
import pandas as pd
from io import BytesIO

a= [['LSJW26760ES050487,2016-04-29,00:40:1,3'],['LSJW26760ES050487,2016-04- 29,00:40:1,2']]
data = [np.genfromtxt(BytesIO(item[0].encode()), delimiter=',', dtype=str) for item in a]
d = pd.DataFrame(data, columns='type date time flag'.split())
d.date = pd.to_datetime(d.date)
d.time = pd.to_timedelta(d.time)
d.flag = pd.to_numeric(d.flag)
print(d)

输出:

                type       date     time  flag
0  LSJW26760ES050487 2016-04-29 00:40:01     3
1  LSJW26760ES050487 2016-04-29 00:40:01     2

答案 1 :(得分:0)

pandas.DataFrame()可以从列表列表构建数据框。您需要做的唯一预处理步骤是将字符串转换为列表,您可以使用"string".split(",")执行此操作。这是一个有效的例子:

>>> import pandas as pd 
>>> a= [['LSJW26760ES050487,2016-04-29,00:40:1,3'],['LSJW26760ES050487,2016-04-29,00:40:1,2'],['LSJW26760ES050487,2016-04-29,00:45:1,3'],['LSJW26760ES050487,2016-04-29,00:40:1,4']]
>>> 
>>> 
>>> a = [i[0].split(",") for i in a]
>>> df = pd.DataFrame(a)
>>> df.head()
                   0           1        2  3
0  LSJW26760ES050487  2016-04-29  00:40:1  3
1  LSJW26760ES050487  2016-04-29  00:40:1  2
2  LSJW26760ES050487  2016-04-29  00:45:1  3
3  LSJW26760ES050487  2016-04-29  00:40:1  4
>>> 

作为最后一步,您可以按如下方式添加列名称:

>>> df.columns = ["type","date", "time", "flag"]
>>> df.head() 
                type        date     time flag
0  LSJW26760ES050487  2016-04-29  00:40:1    3
1  LSJW26760ES050487  2016-04-29  00:40:1    2
2  LSJW26760ES050487  2016-04-29  00:45:1    3
3  LSJW26760ES050487  2016-04-29  00:40:1    4
>>> 

答案 2 :(得分:0)

您需要先str.split进行,拆分,然后转换列:

import pandas as pd

a= [['LSJW26760ES050487,2016-04-29,00:40:1,3'],
    ['LSJW26760ES050487,2016-04-29,00:40:1,2'],
    ['LSJW26760ES050487,2016-04-29,00:45:1,3'],
    ['LSJW26760ES050487,2016-04-29,00:40:1,4']]


df = pd.DataFrame(a, columns=['col'])

df = df.col.str.split(',', expand=True)
df.columns = ['type','data','time','flag']
df['data'] = pd.to_datetime(df.data)
df['time'] = pd.to_timedelta(df.time)
df['flag'] = df.flag.astype(int)

print (df)
                type       data     time  flag
0  LSJW26760ES050487 2016-04-29 00:40:01     3
1  LSJW26760ES050487 2016-04-29 00:40:01     2
2  LSJW26760ES050487 2016-04-29 00:45:01     3
3  LSJW26760ES050487 2016-04-29 00:40:01     4

print (df.dtypes)
type             object
data     datetime64[ns]
time    timedelta64[ns]
flag              int32
dtype: object

另一种解决方案,如果数据不是NaN

import pandas as pd

a= [['LSJW26760ES050487,2016-04-29,00:40:1,3'],
    ['LSJW26760ES050487,2016-04-29,00:40:1,2'],
    ['LSJW26760ES050487,2016-04-29,00:45:1,3'],
    ['LSJW26760ES050487,2016-04-29,00:40:1,4']]


df = pd.DataFrame([x[0].split(',') for x in a], columns=['type', 'data', 'time', 'flag'])
df['data'] = pd.to_datetime(df.data)
df['time'] = pd.to_timedelta(df.time)
df['flag'] = df.flag.astype(int)
print (df)
                type       data     time  flag
0  LSJW26760ES050487 2016-04-29 00:40:01     3
1  LSJW26760ES050487 2016-04-29 00:40:01     2
2  LSJW26760ES050487 2016-04-29 00:45:01     3
3  LSJW26760ES050487 2016-04-29 00:40:01     4

print (df.dtypes)
type             object
data     datetime64[ns]
time    timedelta64[ns]
flag              int32
dtype: object