如何将csv字符串转换为pandas中的列表?

时间:2016-07-03 14:46:34

标签: python python-3.x csv numpy pandas

我正在使用具有以下格式的csv文件:

Either[A, B]

我想把它读成一个数据框,其类型为"Id","Sequence" 3,"1,3,13,87,1053,28576,2141733,508147108,402135275365,1073376057490373,9700385489355970183,298434346895322960005291,31479360095907908092817694945,11474377948948020660089085281068730" 7,"1,2,1,5,5,1,11,16,7,1,23,44,30,9,1,47,112,104,48,11,1,95,272,320,200,70,13,1,191,640,912,720,340,96,15,1,383,1472,2464,2352,1400,532,126,17,1,767,3328,6400,7168,5152,2464,784,160,19,1,1535,7424" 8,"1,2,4,5,8,10,16,20,32,40,64,80,128,160,256,320,512,640,1024,1280,2048,2560,4096,5120,8192,10240,16384,20480,32768,40960,65536,81920,131072,163840,262144,327680,524288,655360,1048576,1310720,2097152" 11,"1,8,25,83,274,2275,132224,1060067,3312425,10997342,36304451,301432950,17519415551,140456757358,438889687625,1457125820233,4810267148324,39939263006825,2321287521544174,18610239435360217" ,类似于整数,df['Id']的类型是列表式的。

我目前有以下kludgy代码:

df['Sequence']

这似乎有效,但我觉得使用pandas和numpy可以原生一样。

有人有推荐吗?

3 个答案:

答案 0 :(得分:5)

您可以为Sequence列指定converter

  

convertersdict,默认None

     

转换函数的字典   某些列中的值。键可以是整数或列   标签

train = pd.read_csv(training_data_file, converters={'Sequence': clean})

答案 1 :(得分:0)

这也有效,除了Sequence是字符串列表而不是int列表:

df = pd.read_csv(training_data_file)
df['Sequence'] = df['Sequence'].str.split(',')

将每个元素转换为int:

df = pd.read_csv(training_data_file)
df['Sequence'] = df['Sequence'].str.split(',').apply(lambda s: list(map(int, s)))

答案 2 :(得分:0)

另一种解决方案是使用literal_eval模块中的astliteral_eval计算字符串作为输入给Python解释器,并应按预期返回列表。

def clean(x):
    return literal_eval(x)

train = pd.read_csv(training_data_file, converters={'Sequence': clean})