我正在尝试使用RE来清理字符串并将它们放入清洁的有序库存列表中,同时保留原始订单或不重复库存。
例如,将字符串(注意额外的随机字母和不一致的间距):
SLTD +14% NRX +14%L MEIP -68% YU -345% RXII -13%RESN -10% LBIO -10% WHLR-10% HSIDD -339%
进入像
这样的列表stocks = ['SLTD +14%', 'NRX +14%', 'MEIP -68%', 'YU -345%', 'RXII -13%','RESN -10%', 'LBIO -10%', 'WHLR -10%', 'HSIDD -339%']
我需要REs
RXII -13%RESN -10%
之类的情况会失败)WHLR-10%
这样的案例)NRX +14%L
)NES -14% HELI
)WHLR-10% HSIDD -339%
)这是迄今为止我所获得的最好的,它重复每个股票4,然后3,然后2个字母,当然是可怕的代码。
非常感谢任何帮助, 谢谢你的时间。
allStocks = ['SLTD +14% NRX +14%L MEIP -68% YU -345% RXII -13%RESN -10% LBIO -10% WHLR-10% HSIDD -339%','ENZ -17% NSLP -17% SCON -15% PKOH -15% PFIE -14% PRTS -14% NES -14% HELI']
for messyDayOfStocks in allStocks:
stocksWithStuff = re.findall('(\S\S\S\S\S\s\D\d\d\d%)', "messyDayOfStocks")
stocksWithStuff.append(re.findall('(\S\S\S\S\s\D\d\d\d%)', "messyDayOfStocks"))
stocksWithStuff.append(re.findall('(\S\S\S\s\D\d\d\d%)', "messyDayOfStocks"))
stocksWithStuff.append(re.findall('(\S\S\s\D\d\d\d%)', "messyDayOfStocks"))
stocksWithStuff.append(re.findall('(\S\S\S\S\S\s\D\d\d%)', "messyDayOfStocks"))
stocksWithStuff.append(re.findall('(\S\S\S\S\s\D\d\d%)', "messyDayOfStocks"))
stocksWithStuff.append(re.findall('(\S\S\S\s\D\d\d%)', "messyDayOfStocks"))
stocksWithStuff.append(re.findall('(\S\S\s\D\d\d%)', "messyDayOfStocks"))
print(stocksWithStuff)
答案 0 :(得分:1)
尝试(\S{2,5}\s?[+-]\d{1,3}%)
这是寻找:
\S{2,5}
介于2到5个非空白字符之间\s?
可能的空格字符[+-]
加号或减号\d{1,3}
介于1到3位之间%
百分号这不会为您插入空格,但您可以使用2个捕获组,如下所示:
(\S{2,5})\s?([+-]\d{1,3}%)
分别获得公司ID和百分比
您可以在代码中使用此代码:
allStocks = ['SLTD +14% NRX +14%L MEIP -68% YU -345% RXII -13%RESN -10% LBIO -10% WHLR-10% HSIDD -339%','ENZ -17% NSLP -17% SCON -15% PKOH -15% PFIE -14% PRTS -14% NES -14% HELI']
for messyDayOfStocks in allStocks:
allMatches = re.findall('(\S{2,5})\s?([+-]\d{1,3}%)', messyDayOfStocks)
stocks = ['{} {}'.format(m.group(1), m.group(2) for m in allMatches]
print(stocks)
答案 1 :(得分:1)
我认为这对你有用。在此代码中,我找到模式,将格式化的输出附加到列表中,然后再次启动搜索,指定搜索的起始位置作为上次搜索的结束。
astr = 'SLTD +14% NRX +14%L MEIP -68% YU -345% RXII -13%RESN -10% LBIO -10% WHLR-10% HSIDD -339%'
out = []
pat = '([A-Z]{2,5}) ?(\+|-)(\d{2,3}\%)'
regex = re.compile(pat)
res = regex.search(astr)
while res:
out.append(res.group(1)+' '+res.group(2)+res.group(3))
res = regex.search(astr, res.start()+len(res.group(0)))
print out
# prints ['SLTD +14%', 'NRX +14%', 'MEIP -68%', 'YU -345%', 'RXII -13%', 'RESN -10%', 'LBIO -10%', 'WHLR -10%', 'HSIDD -339%']
答案 2 :(得分:1)
你也许可以用这个:
import re
allStocks = ['SLTD +14% NRX +14%L MEIP -68% YU -345% RXII -13%RESN -10% LBIO -10% WHLR-10% HSIDD -339%','ENZ -17% NSLP -17% SCON -15% PKOH -15% PFIE -14% PRTS -14% NES -14% HELI']
stocksWithStuff = []
for messyDayOfStocks in allStocks:
# Get each match
for match in re.finditer(r"([A-Z]{2,5})\s*([-+]?\d{2,3}%)", messyDayOfStocks):
# Format it
stocksWithStuff.append("{0} {1}".format(match.group(1), match.group(2)))
print(stocksWithStuff)
输出:
['SLTD +14%', 'NRX +14%', 'MEIP -68%', 'YU -345%', 'RXII -13%', 'RESN -10%', 'LBIO -10%', 'WHLR -10%', 'HSIDD -339%', 'ENZ -17%', 'NSLP -17%', 'SCON -15%', 'PKOH -15%', 'PFIE -14%', 'PRTS -14%', 'NES -14%']
以上是使用Python3,但它应该很容易转换为Python2语法。
对于正则表达式本身:
([A-Z]{2,5}) # 2 to 5 uppercase letters and store in first group
\s* # Any number of spaces (including none)
(
[-+]? # Any sign if present
\d{2,3}% # 2 to 3 digits and % sign
) # Store the above to the second group