如何使用Python中的pd.read_csv正确读取下表?

时间:2018-03-05 17:57:37

标签: python dataframe mismatch

我将文件读为

  1  [ 1s 1/2-1/2]+    0.83   -66.379    -1.0000000     
  2  [ 1s 1/2 1/2]+    0.83   -66.379    -1.0000000
  3  [ 1s 1/2-1/2]+    0.82   -61.930     1.0000000
  4  [ 1s 1/2 1/2]+    0.82   -61.930     1.0000000
  5  [ 1p 3/2-1/2]-    0.73   -40.210    -1.0000000
  6  [ 1p 3/2 1/2]-    0.77   -40.210    -1.0000000
  7  [ 1p 3/2-3/2]-    0.76   -40.210    -1.0000000
  8  [ 1p 3/2 3/2]-    0.64   -40.210    -1.0000000

以下列方式:

spe=pd.read_csv("spe.dat",delimiter='s\+',skiprows=[0,1])
spe.columns=['index','label','weight','ee','tz']

我收到了错误消息:

ValueError: Length mismatch: Expected axis has 1 elements, new values have 5 elements

我意识到第二列如'[ 1s 1/2-1/2]+'被读为三列。有没有办法将整个'[ 1s 1/2-1/2]+'作为一列阅读?感谢。

1 个答案:

答案 0 :(得分:0)

在阅读DataFrame时,您没有正确分隔列。我建议阅读the Python regex tutorial以了解如何为分隔符使用正则表达式。

columns = ['index','label','weight','ee','tz']
pd.read_csv('spe.dat', sep='\s{2,}', names=columns, index_col=0, skiprows=[0, 1])

返回

                label  weight      ee   tz
index                                     
1      [ 1s 1/2-1/2]+    0.83 -66.379 -1.0
2      [ 1s 1/2 1/2]+    0.83 -66.379 -1.0
3      [ 1s 1/2-1/2]+    0.82 -61.930  1.0
4      [ 1s 1/2 1/2]+    0.82 -61.930  1.0
5      [ 1p 3/2-1/2]-    0.73 -40.210 -1.0
6      [ 1p 3/2 1/2]-    0.77 -40.210 -1.0
7      [ 1p 3/2-3/2]-    0.76 -40.210 -1.0
8      [ 1p 3/2 3/2]-    0.64 -40.210 -1.0