是否可以创建包含列表类型字段的pandas.DataFrame?
例如,我想将以下csv加载到pandas.DataFrame:
id,scores
1,"[1,2,3,4]"
2,"[1,2]"
3,"[0,2,4]"
答案 0 :(得分:4)
剥去双引号:
id,scores
1, [1,2,3,4]
2, [1,2]
3, [0,2,4]
你应该能够做到这一点:
query = [[1, [1,2,3,4]], [2, [1,2]], [3, [0,2,4]]]
df = pandas.DataFrame(query, columns=['id', 'scores'])
print df
答案 1 :(得分:1)
您可以使用:
import pandas as pd
import io
temp=u'''id,scores
1,"[1,2,3,4]"
2,"[1,2]"
3,"[0,2,4]"'''
df = pd.read_csv(io.StringIO(temp), sep=',', index_col=[0] )
print df
scores
id
1 [1,2,3,4]
2 [1,2]
3 [0,2,4]
但是列分数的dtype是object
,而不是列表。
一种方法使用ast
和converters
:
import pandas as pd
import io
from ast import literal_eval
temp=u'''id,scores
1,"[1,2,3,4]"
2,"[1,2]"
3,"[0,2,4]"'''
def converter(x):
#define format of datetime
return literal_eval(x)
#define each column
converters={'scores': converter}
df = pd.read_csv(io.StringIO(temp), sep=',', converters=converters)
print df
id scores
0 1 [1, 2, 3, 4]
1 2 [1, 2]
2 3 [0, 2, 4]
#check lists:
print 2 in df.scores[2]
#True
print 1 in df.scores[2]
#False