可能的输入示例:
'Starts in 09h 52m 56s'
'Ends in 00h 33m 13s'
上面两个输入的输出将是:
['Starts', '09', '52', '56']
['Ends', '00', '33', '13']
下面是一种工作模式:
(Starts|Ends) in ([0-9]{2})h ([0-9]{2})m ([0-9]{2})s
不幸的是,它输出的所有内容如下:
[('Ends', '00', '46', '34')]
代替:
['Ends', '00', '46', '34']
但是,更重要的是,我想使正则表达式更简洁,而不必重复三遍([0-9]{2})
。
我尝试使用(Starts|Ends)|([0-9]{2})[h|m|s]
,但这会输出以下内容:
[('Ends', ''), ('', '04'), ('', '20'), ('', '41')]
同样,我正在寻找的输出很简单:
['Ends', '00', '33', '13']
根据要求,这是我的代码:
regex_time_left = re.compile(r'(Starts|Ends) in ([0-9]{2})h ([0-9]{2})m ([0-9]{2})s')
for product_page in indi_product_urls:
time_left = ff.find_elements(By.CSS_SELECTOR, 'span[id*=deal_expiry_timer_]')
if len(time_left) > 0:
time_left = regex_time_left.findall(time_left[0].text) # [('Ends', '00', '32', '31')]
starts_ends = time_left[0][0]
hours = time_left[0][1]
minutes = time_left[0][2]
seconds = time_left[0][3]
有什么想法吗?
答案 0 :(得分:1)
尝试此代码!
您可以使用regex
(在python中导入re库)并提取小时,分钟和秒的值。
d {2}表示2位整数,因为小时/分钟/秒值始终为2位。
代码:
import re
start = 'Starts in 09h 52m 56s'
end = 'Ends in 00h 33m 13s'
matchObj = re.match( r'(?:Starts|Ends)[ ]in[ ](\d{2})h[ ](\d{2})m[ ](\d{2})s', start, re.M|re.I)
print ("Start Hours : ", matchObj.group(1))
print ("Start Minutes : ", matchObj.group(2))
print ("Start Seconds : ", matchObj.group(3))
matchObj = re.match( r'(?:Starts|Ends)[ ]in[ ](\d{2})h[ ](\d{2})m[ ](\d{2})s', end, re.M|re.I)
print ("End Hours : ", matchObj.group(1))
print ("End Minutes : ", matchObj.group(2))
print ("End Seconds : ", matchObj.group(3))
输出:
Start Hours : 09
Start Minutes : 52
Start Seconds : 56
End Hours : 00
End Minutes : 33
End Seconds : 13
通过regex101验证:
答案 1 :(得分:0)
You can use the following to match words before a matching pattern indefinite times:
a=['starts in 09h 05m 33s','ends in 00h 33m 12s']
import re
r1 = re.compile(r'(starts|ends)')
r2 = re.compile(r'(\d{2})[hms]')
for s in a:
m1 = r1.match(s)
if m1:
m2 = r2.findall(s)
print(m1.group(0), m2[0], m2[1], m2[2])
答案 2 :(得分:0)
您可以压缩相应的对并从元组中提取数据
s = 'Start in 09h 52m 56s'
s2 = 'Ends in 00h 33m 13s'
lista = list(zip(s.split(), s2.split()))
s_list = [lista[0][0]]
e_list = [lista[0][1]]
for i in lista[2:5]:
s_list.append(i[0][:2])
e_list.append(i[1][:2])
print(s_list)
print(e_list)
['Start', '09', '52', '56'] ['Ends', '00', '33', '13']
如果您要匹配数字,我会将其附加为int
,并在执行此操作时完成
for i in lista[2:5]:
s_list.append(int((i[0][:2])))
e_list.append(int((i[1][:2])))
~/python/stackoverflow/9.22$ python3.7 class.py ['Start', 9, 52, 56] ['Ends', 0, 33, 13]
答案 3 :(得分:0)
我认为您可以这样做,但不幸的是,使用(starts|ends)
无法适应更多情况
import re
a = ['starts in 09h 05m 33s','ends in 00h 33m 12s','Ends in 00h 33m 12s']
print([re.findall("(starts|ends|\d+)",i) for i in a])
但是您可以尝试:
print([ [i.split(" ")[0]] + re.findall("\d+",i) for i in a])