从数据框中的浮点数删除双引号

时间:2020-05-31 19:18:31

标签: python pandas

我有一些期权链数据:

Contract Name,Last Trade Date,Strike,Last Price,Bid,Ask,Change

AMZN200605P03320000,2020-05-28 3:24PM EDT,3320.0,900.65,876.0,893.5,+900.65

AMZN200605P03500000,2020-05-28 3:51PM EDT,3500.0,1099.55,1055.5,1073.5,"+1,099.55"

条目-“ +1,099.55”在数据中似乎是一个错误的条目,因为没有其他类似的记录,因此在插入SQLDB之前我需要对其进行清理。我尝试了几种不同的方法,但均无济于事。任何见识将不胜感激:

optionsChainPuts['Change'] = optionsChainPuts['Change'].map(lambda x: x.lstrip('\"+').rstrip('\"'))
optionsChainPuts['Change'] = optionsChainPuts['Change'].astype(str).str.replace('\D', '')
optionsChainPuts['Change'] = optionsChainPuts['Change'].astype(str).map(lambda x: x.replace('"', ''))

谢谢

2 个答案:

答案 0 :(得分:0)

引起问题的是逗号。一种选择是将其在逗号处分割并合并值

>>> val = "+1,099.55"
>>> val = val.split(",")
>>> num = float(val[0] + val[1])
>>> num
1099.55

希望有帮助!

答案 1 :(得分:0)

问题是带有逗号和引号的数字。

使用locale将欧洲语言转换为英语

代码

from io import StringIO
import pandas as pd
import locale

s = '''Contract Name,Last Trade Date,Strike,Last Price,Bid,Ask,Change
AMZN200605P03320000,2020-05-28 3:24PM EDT,3320.0,900.65,876.0,893.5,+900.65
AMZN200605P03500000,2020-05-28 3:51PM EDT,3500.0,1099.55,1055.5,1073.5,"+1,099.55"'''

df = pd.read_csv(StringIO(s))

# set local to English
locale.setlocale( locale.LC_ALL, 'en_US.UTF-8' ) 

# Convert column to float
df['Change'] = df['Change'].apply(lambda x: locale.atof(x))

print(df['Change'])

输出

Name: Change, dtype: object
0     900.65
1    1099.55
Name: Change, dtype: float64