将文本文件导入Python / Pandas时出现问题

时间:2014-09-17 00:03:58

标签: python pandas

我正在尝试将一个非常混乱的文本文件加载到Python / Pandas中。以下是文件中数据的示例

('9ebabd77-45f5-409c-b4dd-6db7951521fd','9da3f80c-6bcd-44ae-bbe8-760177fd4dbc','Seattle, WA','2014-08-05 10:06:24','viewed_home_page'),('9ebabd77-45f5-409c-b4dd-6db7951521fd','9da3f80c-6bcd-44ae-bbe8-760177fd4dbc','Seattle, WA','2014-08-05 10:06:36','viewed_search_results'),('41aa8fac-1bd8-4f95-918c-413879ed43f1','bcca257d-68d3-47e6-bc58-52c166f3b27b','Madison, WI','2014-08-16 17:42:31','visit_start')

这是我的代码

import pandas as pd
cols=['ID','Visit','Market','Event Time','Event Name']
table=pd.read_table('C:\Users\Desktop\Dump.txt',sep=',', header=None,names=cols,nrows=10)

但是当我看着桌子时,它仍然无法正确读取。

所有数据主要是一行。

1 个答案:

答案 0 :(得分:2)

您可以使用ast.literal_eval将数据解析为元组的Python元组,然后您可以在其上调用pd.DataFrame

import pandas as pd
import ast

cols=['ID','Visit','Market','Event Time','Event Name']
with open(filename, 'rb') as f:
    data = ast.literal_eval(f.read())
    df = pd.DataFrame(list(data), columns=cols)
    print(df)

产量

                                     ID                                 Visit  \
0  9ebabd77-45f5-409c-b4dd-6db7951521fd  9da3f80c-6bcd-44ae-bbe8-760177fd4dbc   
1  9ebabd77-45f5-409c-b4dd-6db7951521fd  9da3f80c-6bcd-44ae-bbe8-760177fd4dbc   
2  41aa8fac-1bd8-4f95-918c-413879ed43f1  bcca257d-68d3-47e6-bc58-52c166f3b27b   

        Market           Event Time             Event Name  
0  Seattle, WA  2014-08-05 10:06:24       viewed_home_page  
1  Seattle, WA  2014-08-05 10:06:36  viewed_search_results  
2  Madison, WI  2014-08-16 17:42:31            visit_start