Pandas read_table错误

时间:2016-05-25 19:18:36

标签: python mysql pandas

我正在尝试将制表符分隔的文本文件读入数据帧。

这是文件在Excel中的显示方式:

CALENDAR_DATE   ORDER_NUMBER    INVOICE_NUMBER  TRANSACTION_TYPE    CUSTOMER_NUMBER   CUSTOMER_NAME
5/13/2016 0:00    13867666       6892372              S                 2026            CUSTOMER 1

导入df:

df = p.read_table("E:/FileLoc/ThisIsAFile.txt", encoding = "iso-8859-1")

现在它没有看到前三列作为列索引的一部分(df [0] =交易类型),并且所有标题都转移以反映这一点。

                                CALENDAR_DATE   ORDER_NUMBER    INVOICE_NUMBER
5/13/2016 0:00 13867666 6892372       S             2026          CUSTOMER 1

我正在尝试操作文本文件,然后将其作为最终结果导入mysql数据库。

1 个答案:

答案 0 :(得分:5)

您可以将read_csv与分隔符2和更多空格一起使用:

import pandas as pd
import io

temp=u"""CALENDAR_DATE   ORDER_NUMBER    INVOICE_NUMBER  TRANSACTION_TYPE    CUSTOMER_NUMBER   CUSTOMER_NAME
5/13/2016 0:00    13867666       6892372              S                 2026            CUSTOMER 1"""
#after testing replace io.StringIO(temp) to filename
df =pd.read_csv(io.StringIO(temp), sep=r'\s{2,}', engine='python', encoding = "iso-8859-1")
print (df)
    CALENDAR_DATE  ORDER_NUMBER  INVOICE_NUMBER TRANSACTION_TYPE  \
0  5/13/2016 0:00      13867666         6892372                S   

   CUSTOMER_NUMBER CUSTOMER_NAME  
0             2026    CUSTOMER 1  

如果分隔符为tabulator,请使用sep='\t'

编辑:

我用你的数据测试它并且它可以工作:

import pandas as pd

df = pd.read_csv('test/AnonymizedData.txt', sep='\t')
print (df)

   CUSTOMER_NUMBER CUSTOMER_NAME  CUSTOMER_BRANCH_CODE CUSTOMER_BRANCH_NAME  \
0             2026    CUSTOMER 1                    83       SALES BRANCH 1   
1             2359    CUSTOMER 2                    76       SALES BRANCH 2   
2           100662    CUSTOMER 3                    28       SALES BRANCH 3   
3             3245    CUSTOMER 4                    84       SALES BRANCH 4   
4             3179    CUSTOMER 5                    28       SALES BRANCH 5   
5            39881    CUSTOMER 6                    67       SALES BRANCH 6   
6            37020    CUSTOMER 7                    58       SALES BRANCH 7   
7             1239    CUSTOMER 8                    50       SALES BRANCH 8   
8             2379    CUSTOMER 9                    76       SALES BRANCH 9   

  CUSTOMER_CITY CUSTOMER_STATE     ...      PRICING_PRODUCT_TYPE_CODE  \
0        TOWN 1             CO     ...                             11   
1        TOWN 2             OH     ...                             11   
2        TOWN 3             ME     ...                             11   
3        TOWN 4             IL     ...                             11   
4        TOWN 5             NH     ...                             11   
5        TOWN 6             TX     ...                             11   
6        TOWN 7             NC     ...                             11   
7        TOWN 8             NY     ...                             11   
8        TOWN 9             OH     ...                             11   

  PRICING_PRODUCT_TYPE  ORGANIZATION_ID ORGANIZATION_NAME  PRODUCT_LINE_CODE  \
0          DISPOSABLES               83  ORGANIZATIONNAME                891   
1          DISPOSABLES               83  ORGANIZATIONNAME                891   
2          DISPOSABLES               83  ORGANIZATIONNAME                891   
3          DISPOSABLES               83  ORGANIZATIONNAME                891   
4          DISPOSABLES               83  ORGANIZATIONNAME                891   
5          DISPOSABLES               83  ORGANIZATIONNAME                891   
6          DISPOSABLES               83  ORGANIZATIONNAME                891   
7          DISPOSABLES               83  ORGANIZATIONNAME                891   
8          DISPOSABLES               83  ORGANIZATIONNAME                891   

  PRODUCT_LINE  ROBOTIC_FLAG  Unnamed: 52  Unnamed: 53  Unnamed: 54  
0  PRODUCTNAME             N            N          NaN            3  
1  PRODUCTNAME             N            N          NaN            3  
2  PRODUCTNAME             N            N          NaN            2  
3  PRODUCTNAME             N            N          NaN            7  
4  PRODUCTNAME             N            N          NaN            1  
5  PRODUCTNAME             N            N          NaN            4  
6  PRODUCTNAME             N            N          NaN            3  
7  PRODUCTNAME             N            N          NaN            5  
8  PRODUCTNAME             N            N          NaN            3  

[9 rows x 55 columns]