我的数据看起来像这样(注意SPX之后的PM):
11 Dec 1650.00 (SPXPM1130L1650-E),1.90,0.0,1.35,2.30,0,10,11 Dec 1650.00 (SPXPM1130X1650-E),0.0,0.0,376.20,380.00,0,0,
或者像这样(注意没有-E,W或PM):
14 Oct 800.00 (SPX1418J800),0.0,0.0,1067.10,1071.40,0,0,14 Oct 800.00 (SPX1418V800),0.09,0.0,0.0,0.05,0,25,
或看起来像这样的数据(注意SPX后的额外W):
11 Jan 1075.00 (SPXW1128A1075-E),0.0,0.0,215.30,217.00,0,0,11 Jan 1075.00 (SPXW1128M1075-E),0.05,-0.10,0.05,0.10,10,15535,
我在Python中使用以下正则表达式来获取数据的整个第一个逗号分隔字段(即"14 Oct 800.00 (SPX1418J800)"
):
spx_symbol = re.compile("\\(SPX(1[0-9])([0-9]{2})([A-Z])([0-9]{3,4})-E\\)")
这适用于上面的第二种数据格式(具有W的那种格式),但是当 NOT 额外的" W"当我试图从固定点获取令牌时,有些人没有额外的-E或PM,。请参阅以下功能。
当我喂它上面的第一行时,我得到:
spx_symbol.split(line)
这是使用split
之后这些令牌的方式,并且只有原始正则表达式在某些时候有效:
def ExpiryMonth(s):
"""
SPX contract months
"""
call_months = "ABCDEFGHIJKL"
put_months = "MNOPQRSTUVWX"
try:
m = call_months.index(s)
except ValueError:
m = put_months.index(s)
return m
#spx_symbol = re.compile("\\(SPX(1[0-9])([0-9]{2})([A-Z])([0-9]{3,4})-E\\)") WORKS SOME OF TIME
spx_symbol = re.compile("\((SPX(1[0-9])([0-9]{2})([A-Z])([0-9]{3,4})(-E)?\\))")
def parseSPX(s):
"""
Parse an SPX quote string, return expiry date and strike
"""
tokens = spx_symbol.split(s)
if len(tokens) == 1:
return {'dtExpiry': None, 'strike': -1}
year = 2000 + int(tokens[1])
day = int(tokens[2])
month = ExpiryMonth(tokens[3])
strike = float(tokens[4])
dtExpiry = datetime.date(year, month, day)
return ({'dtExpiry': dtExpiry, 'strike': strike})
答案 0 :(得分:1)
我在python中有一个正则表达式,试图获取数据的整个第一个逗号分隔字段。换句话说,例如," 8月8日800.00(SPX1418J800)"
只需使用split
,分隔逗号并获取第一个元素,您就不需要了:
s="14 Oct 800.00 (SPX1418J800),0.0,0.0,1067.10,1071.40,0,0,14 Oct 800.00 (SPX1418V800),0.09,0.0,0.0,0.05,0,25"
print(s.split(",",1)[0])
14 Oct 800.00 (SPX1418J800)
s1 = "11 Jan 1075.00 (SPXW1128A1075-E),0.0,0.0,215.30,217.00,0,0,11 Jan 1075.00 (SPXW1128M1075-E),0.05,-0.10,0.05,0.10,10,15535,"
print(s1.split(",",1)[0])
11 Jan 1075.00 (SPXW1128A1075-E)
如果您只是想根据问题中的输出而在parens内部,那么您可以再次拆分:
s = "14 Oct 800.00 (SPX1418J800),0.0,0.0,1067.10,1071.40,0,0,14 Oct 800.00 (SPX1418V800),0.09,0.0,0.0,0.05,0,25"
print(s.split(",",1)[0].rsplit(" ",1)[-1])
(SPX1418J800)
或者只是使用csv模块:
import csv
with open(my.csv) as f:
reader = csv.reader(f,delimiter=",")
for line in reader:
print(line[0])
14 Oct 800.00 (SPX1418J800)
11 Jan 1075.00 (SPXW1128A1075-E)
答案 1 :(得分:0)
这是我使用的RegEx:
"\((SPXW?(1[0-9])([0-9]{2})([A-Z])([0-9]{3,4})(-E)?\))"
你可以看到它在这里运作。我正在打印整个匹配的部分
>>> first_fields = [
... "14 Oct 800.00 (SPX1418J800)",
... "11 Jan 1075.00 (SPXW1128A1075-E)"
... ]
>>> spx_symbols = re.compile("\((SPXW?(1[0-9])([0-9]{2})([A-Z])([0-9]{3,4})(-E)?\))")
>>> for f in first_fields:
... print spx_symbols.search(f).group(0)
...
(SPX1418J800)
(SPXW1128A1075-E)
我所做的改变:
W? - This looks for an optional "W"
(-E)? - This looks for an optional "-E"