我有一些期权链数据:
Contract Name,Last Trade Date,Strike,Last Price,Bid,Ask,Change
AMZN200605P03320000,2020-05-28 3:24PM EDT,3320.0,900.65,876.0,893.5,+900.65
AMZN200605P03500000,2020-05-28 3:51PM EDT,3500.0,1099.55,1055.5,1073.5,"+1,099.55"
条目-“ +1,099.55”在数据中似乎是一个错误的条目,因为没有其他类似的记录,因此在插入SQLDB之前我需要对其进行清理。我尝试了几种不同的方法,但均无济于事。任何见识将不胜感激:
optionsChainPuts['Change'] = optionsChainPuts['Change'].map(lambda x: x.lstrip('\"+').rstrip('\"'))
optionsChainPuts['Change'] = optionsChainPuts['Change'].astype(str).str.replace('\D', '')
optionsChainPuts['Change'] = optionsChainPuts['Change'].astype(str).map(lambda x: x.replace('"', ''))
谢谢
答案 0 :(得分:0)
引起问题的是逗号。一种选择是将其在逗号处分割并合并值
>>> val = "+1,099.55"
>>> val = val.split(",")
>>> num = float(val[0] + val[1])
>>> num
1099.55
希望有帮助!
答案 1 :(得分:0)
问题是带有逗号和引号的数字。
使用locale将欧洲语言转换为英语
代码
from io import StringIO
import pandas as pd
import locale
s = '''Contract Name,Last Trade Date,Strike,Last Price,Bid,Ask,Change
AMZN200605P03320000,2020-05-28 3:24PM EDT,3320.0,900.65,876.0,893.5,+900.65
AMZN200605P03500000,2020-05-28 3:51PM EDT,3500.0,1099.55,1055.5,1073.5,"+1,099.55"'''
df = pd.read_csv(StringIO(s))
# set local to English
locale.setlocale( locale.LC_ALL, 'en_US.UTF-8' )
# Convert column to float
df['Change'] = df['Change'].apply(lambda x: locale.atof(x))
print(df['Change'])
输出
Name: Change, dtype: object
0 900.65
1 1099.55
Name: Change, dtype: float64