引用str's; Unquote floats in pandas

时间:2019-08-28 16:01:32

标签: python pandas csv

处理文件 unclean.csv

Date,Wave,Wavelength
2019-08-28,Theta,0.112358472
2019-08-27,Eta,571.5499015
2019-08-27,Lambda,286.4175921
2019-08-26,Iota,0.220237736

带有代码

import os
import csv
import pandas as pd

myfile = ('path/to/'
          'unclean.csv')

os.chdir(os.path.dirname(myfile))
df = pd.read_csv(os.path.basename(myfile))

df['Date'] = pd.to_datetime(df['Date'])
df[['Wave']] = df[['Wave']].astype(str)
df[['Wavelength']] = df[['Wavelength']].astype(float)

df.to_csv('clean.csv',
          float_format='%g',
          index=False,
          quotechar='"',
          quoting=csv.QUOTE_NONNUMERIC)

我得到输出 clean.csv

"Date","Wave","Wavelength"
"2019-08-28","Theta","0.112358"
"2019-08-27","Eta","571.55"
"2019-08-27","Lambda","286.418"
"2019-08-26","Iota","0.220238"

所有内容都用引号引起来,尽管我将Wavelength列的类型专门设置为float,并且将to_csv的参数设置为quoting=csv.QUOTE_NONNUMERIC,但我只想引用非数字字段。

我该如何引用字符串并保持数字不被引用?

许多讨论(例如: 1234) 建议pandas==0.24.2应该这样做。

使用unicodecsv==0.14.1中的anaconda-project==0.8.2float_format='%g'

评论

华伦天奴(Valentino)的答案指出了问题所在,但我知道"Date","Wave","Wavelength" "2019-08-28","Theta",0.11235847199999999 "2019-08-27","Eta",571.5499014999999 "2019-08-27","Lambda",286.41759210000004 "2019-08-26","Iota",0.22023773600000002 别无选择

999999

避免引入0000001Nan

1 个答案:

答案 0 :(得分:1)

来自pandas to_csv文档:

  

quoting:csv模块中的可选常量
  默认为csv.QUOTE_MINIMAL。 如果您设置了float_format,则浮点数将转换为字符串,因此csv.QUOTE_NONNUMERIC会将其视为非数字。

(强调是我的)

只需删除float_format='%g'参数,浮点数将不被引用。

编辑

据我所知,如果需要格式化浮点数,则没有直接方法可以使用to_csv参数来实现所需的功能。
但是您仍然可以自己“伪造”格式。

#make a new dataframe with formatted strings
ddf = df.applymap(lambda x : '{:g}'.format(x) if isinstance(x, float) else '"{}"'.format(x))

#write the new dataframe to csv, now using QUOTE_NOTE because we already added quote characters where needed
ddf.to_csv('clean.csv',
      index=False,
      quoting=csv.QUOTE_NONE)

clean.csv文件如下所示:

Date,Wave,Wavelength
"2019-08-28 00:00:00","Theta",0.112358
"2019-08-27 00:00:00","Eta",571.55
"2019-08-27 00:00:00","Lambda",286.418
"2019-08-26 00:00:00","Iota",0.220238