Question

我试图用python读取多个csv文件。原始数据索引（或第一列）有一点问题，部分csv文件如下所示：

NoDemande;"NoUsager";"Sens";"IdVehiculeUtilise";"NoConducteur";"NoAdresse";"Fait";"HeurePrevue"
42210000003;"42210000529";"+";"265Véh";"42210000032";"42210002932";"1";"25/07/2015 10:00:04"
42210000005;"42210001805";"+";"265Véh";"42210000032";"42210002932";"1";"25/07/2015 10:00:04"
42210000004;"42210002678";"+";"265Véh";"42210000032";"42210002932";"1";"25/07/2015 10:00:04"
42210000003;"42210000529";"—";"265Véh";"42210000032";"42210004900";"1";"25/07/2015 10:50:03"
42210000004;"42210002678";"—";"265Véh";"42210000032";"42210007072";"1";"25/07/2015 11:25:03"
42210000005;"42210001805";"—";"265Véh";"42210000032";"42210004236";"1";"25/07/2015 11:40:03"

第一个索引没有""，在阅读文件之后，它看起来像："NoDemande"而其他人没有""，其余列看起来很好，这使得结果看起来像（不是相同的行）：

"NoDemande"     NoUsager Sens IdVehiculeUtilise NoConducteur    NoAdresse Fait          HeurePrevue
42209000003  42209001975    +            245Véh  42209000002  42209005712    1   24/07/2015 06:30:04
42209000004  42209002021    +            245Véh  42209000002  42209005712    1   24/07/2015 06:30:04
42209000005  42209002208    +            245Véh  42209000002  42209005713    1   24/07/2015 06:45:04
42216000357  42216001501    -            190Véh  42216000139  42216001418    1   31/07/2015 17:15:03
42216000139  42216000788    -         309V7pVéh  42216000059  42216006210    1   31/07/2015 17:15:03
42216000118  42216000188    -            198Véh  42216000051  42216006374    1   31/07/2015 17:15:03

它会在即将到来的动作中导致识别索引名称的问题。如何解决这个问题呢？这是我的阅读文件的代码：

import pandas as pd
import glob

pd.set_option('expand_frame_repr', False)
path = r'D:\Python27\mypfe\data_test'
allFiles = glob.glob(path + "/*.csv")
frame = pd.DataFrame()
list_ = []

for file_ in allFiles:
    #Read file
    df = pd.read_csv(file_,header=0,sep=';',dayfirst=True,encoding='utf8',
                     dtype='str')

    df['Sens'].replace(u'\u2014','-',inplace=True)

    list_.append(df)
    print"fichier lu ",file_

frame = pd.concat(list_)
print frame

Answer 1

事实上，我一直坚持如何从索引中删除双引号。更改角度后，我想也许最好添加一个新列，复制原始列并删除它。所以新列将具有您想要的索引。就我而言，我做了：

frame['NoDemande'] = frame.ix[:, 0]
tl = frame.drop(frame.columns[0],axis=1)

所以我得到了一个我想要的新东西。

Answer 2

我认为最简单的是设置新的列名：

df.columns = ['NoDemande1'] + df.columns[1:].tolist()
print (df)
    NoDemande1     NoUsager Sens IdVehiculeUtilise  NoConducteur    NoAdresse  \
0  42210000003  42210000529    +            265Véh   42210000032  42210002932   
1  42210000005  42210001805    +            265Véh   42210000032  42210002932   
2  42210000004  42210002678    +            265Véh   42210000032  42210002932   
3  42210000003  42210000529    -           265Véh   42210000032  42210004900   
4  42210000004  42210002678    -           265Véh   42210000032  42210007072   
5  42210000005  42210001805    -           265Véh   42210000032  42210004236   

   Fait          HeurePrevue  
0     1  25/07/2015;10:00:04  
1     1  25/07/2015;10:00:04  
2     1  25/07/2015;10:00:04  
3     1  25/07/2015;10:50:03  
4     1  25/07/2015;11:25:03  
5     1  25/07/2015;11:40:03

另一个解决方案是来自列名的strip值"：

print (df)
   "NoDemande"     NoUsager Sens IdVehiculeUtilise  NoConducteur    NoAdresse  \
0  42210000003  42210000529    +            265Véh   42210000032  42210002932   
1  42210000005  42210001805    +            265Véh   42210000032  42210002932   
2  42210000004  42210002678    +            265Véh   42210000032  42210002932   
3  42210000003  42210000529    -           265Véh   42210000032  42210004900   
4  42210000004  42210002678    -           265Véh   42210000032  42210007072   
5  42210000005  42210001805    -           265Véh   42210000032  42210004236   

   Fait          HeurePrevue  
0     1  25/07/2015;10:00:04  
1     1  25/07/2015;10:00:04  
2     1  25/07/2015;10:00:04  
3     1  25/07/2015;10:50:03  
4     1  25/07/2015;11:25:03  
5     1  25/07/2015;11:40:03

df.columns = df.columns.str.strip('"')
print (df)
     NoDemande     NoUsager Sens IdVehiculeUtilise  NoConducteur    NoAdresse  \
0  42210000003  42210000529    +            265Véh   42210000032  42210002932   
1  42210000005  42210001805    +            265Véh   42210000032  42210002932   
2  42210000004  42210002678    +            265Véh   42210000032  42210002932   
3  42210000003  42210000529    -            265Véh   42210000032  42210004900   
4  42210000004  42210002678    -            265Véh   42210000032  42210007072   
5  42210000005  42210001805    -            265Véh   42210000032  42210004236   

   Fait          HeurePrevue  
0     1  25/07/2015;10:00:04  
1     1  25/07/2015;10:00:04  
2     1  25/07/2015;10:00:04  
3     1  25/07/2015;10:50:03  
4     1  25/07/2015;11:25:03  
5     1  25/07/2015;11:40:03

如何从python中的csv文件索引中删除双引号

2 个答案: