在Python中提取百分比之前的所有数字?

时间:2012-12-18 03:38:10

标签: python regex

我有一个这样的字符串:

receiving incremental file list genelaytics/ genelaytics/.project 421 3% 411.13kB/s 0:00:00 421 3% 411.13kB/s 0:00:00 (xfr#1, to-chk=13/15) 421 3% 411.13kB/s 0:00:00 (xfr#1, to-chk=8/15) genelaytics/.pydevproject 1,006 7% 982.42kB/s 0:00:00 (xfr#2, to-chk=12/15) genelaytics/hello.py 1,006 7% 982.42kB/s 0:00:00 (xfr#3, to-chk=11/15) genelaytics/manage.py 1,260 10% 1.20MB/s 0:00:00 (xfr#4, to-chk=10/15) genelaytics/ok.py 1,260 10% 1.20MB/s 0:00:00 (xfr#5, to-chk=9/15) genelaytics/genelaytics/ genelaytics/genelaytics/__init__.py 1,260 10% 35.16kB/s 0:00:00 (xfr#6, to-chk=7/15) genelaytics/genelaytics/__init__.pyc 1,399 11% 39.03kB/s 0:00:00 (xfr#7, to-chk=6/15) genelaytics/genelaytics/settings.py 6,416 50% 179.02kB/s 0:00:00 (xfr#8, to-chk=5/15) genelaytics/genelaytics/settings.pyc 9,468 75% 264.17kB/s 0:00:00 (xfr#9, to-chk=4/15) genelaytics/genelaytics/urls.py 9,813 77% 252.18kB/s 0:00:00 (xfr#10, to-chk=3/15) genelaytics/genelaytics/urls.pyc 10,409 82% 260.64kB/s 0:00:00 (xfr#11, to-chk=2/15) genelaytics/genelaytics/wsgi.py 11,553 91% 289.29kB/s 0:00:00 (xfr#12, to-chk=1/15) genelaytics/genelaytics/wsgi.pyc 12,596 100% 315.40kB/s 0:00:00 (xfr#13, to-chk=0/15) 12,596 100% 33.70kB/s 0:00:00 (xfr#13, to-chk=0/15) 12,596 100% 30.15kB/s 0:00:00 (xfr#13, to-chk=0/ 15)发送287字节收到6,709字节518.22字节/秒总大小为12,596加速是1.80

我想在百分比之前提取所有数字:

3%, 3%, 7%, 10%, 75%, 82% and all.

尝试使用:

re.search('\d*%',test).group()

但是这只提取了第一个百分比3%。

我想要所有数字。我怎样才能做到这一点?感谢

3 个答案:

答案 0 :(得分:6)

使用findall

In [58]: re.findall(r'\d+%', text)
Out[58]: 
['3%', '3%', '3%', '7%', '7%', '10%', '10%', '10%', '11%', '50%', '75%', '77%',
'82%', '91%', '100%', '100%', '100%']

此外,您可能希望使用\d+代替\d*,因此该模式与前面没有数字的迷路%不匹配。

答案 1 :(得分:1)

>>> import re
>>> text = "receiving incremental file list genelaytics/ genelaytics/.project 421 3% 411.13kB/s 0:00:00 421 3% 411.13kB/s 0:00:00 
(xfr#1, to-chk=13/15) 421 3% 411.13kB/s 0:00:00 (xfr#1, to-chk=8/15) genelaytics/.pydevproject 1,006 7% 982.42kB/s 0:00:00 (xfr#2, to-chk=12/15) genelaytics/hello.py 1,006 7% 982.42kB/s 0:00:00 (xfr#3, to-chk=11/15) genelaytics/manage.py 1,260 10% 1.20MB/s 0:00:00 (xfr#4, to-chk=10/15) genelaytics/ok.py 1,260 10% 1.20MB/s 0:00:00 (xfr#5, to-chk=9/15) genelaytics/genelaytics/ genelaytics/genelaytics/__init__.py 1,260 10% 35.16kB/s 0:00:00 (xfr#6, to-chk=7/15) genelaytics/genelaytics/__init__.pyc 1,399 11% 39.03kB/s 
0:00:00 (xfr#7, to-chk=6/15) genelaytics/genelaytics/settings.py 6,416 50% 179.02kB/s 0:00:00 (xfr#8, to-chk=5/15) genelaytics/genelaytics/settings.pyc 9,468 75% 264.17kB/s 0:00:00 (xfr#9, to-chk=4/15) genelaytics/genelaytics/urls.py 9,813 77% 252.18kB/s 0:00:00 (xfr#10, to-chk=3/15) genelaytics/genelaytics/urls.pyc 10,409 82% 260.64kB/s 0:00:00 (xfr#11, to-chk=2/15) genelaytics/genelaytics/wsgi.py 11,553 91% 289.29kB/s 0:00:00 (xfr#12, to-chk=1/15) genelaytics/genelaytics/wsgi.pyc 12,596 100% 315.40kB/s 0:00:00 (xfr#13, to-chk=0/15) 12,596 100% 33.70kB/s 0:00:00 (xfr#13, to-chk=0/15) 12,596 100% 30.15kB/s 0:00:00 (xfr#13, to-chk=0/15) sent 287 bytes received 6,709 bytes 518.22 bytes/sec total size is 12,596 speedup is 1.80"
>>> re.findall(r'(?:\d+%)|(?:\d+\.\d+%)',text)
['3%', '3%', '3%', '7%', '7%', '10%', '10%', '10%', '11%', '50%', '75%', '77%', '82%', '91%', '100%', '100%', '100%']

答案 2 :(得分:0)

改善捕获的更好答案

代码:

import re

# example text
text = '''100%
hgjhgjhgjhg
45.909%
67%
89 hjhjkkj
0.99 ^&^&*jgkg
0.89%'''

# search pattern 
re.findall(r'(?:\d+%)|(?:\d+\.\d+%)',text)

输出:

['100%', '45.909%', '67%', '0.89%']