在python中使用re模块在两个括号之间提取数据

时间:2011-07-01 05:44:47

标签: python regex html-parsing

这是一个测试字符串 “透明网址(http://www.google.com/chart?chs=630x100&cht=bvs&chxt=x&chxl=0:%7C1840%7C1860%7C1880%7C1900%7C1920%7C1940%7C1960%7C1980%7C2000 %7C&安培;反馈网站0,0,100&安培; chxs = 0,676767,11.3000002,0,TL,676767,676767及安培; CHD = E:D9AACPFjGWAAGDLfCeFgBvHLLSCZGED5GOKwDKCxJmF2FwFfERFwEZGcEJHlENJDJ9I0HQDjE-MAK2J9NMI9IAFtNaIOKtGoG2IYKBFvLEJmMLHdIFHXG.IPHrK2I9ULROI8SfHRFTeCIrQPOwXgPHVxQkbCbhg8iDwIvKkety..AAAAAAAA&安培; chbh = 7,0,0&安培; CHG = 11.11 ,0,5,6&安培; chxp = 0,0.0,11.1,22.2,33.3,44.4,55.6,66.7,77.8,88.9&安培; CHCO = 3366CC,bbcced&安培; CHM = R,BBBBBB,0,0.9954,1.0%7CH, BBBBBB,0,1.0,1.0,1&安培; chxs = 0,000000,11,-1&安培; HL =烯)“

我想提取两个括号之间的所有数据 - 第一行中的url后面的数据和结尾的结束括号 - 使用python的re模块

1 个答案:

答案 0 :(得分:1)

jcomeau@intrepid:/tmp$ python
Python 2.6.7 (r267:88850, Jun 13 2011, 22:03:32) 
[GCC 4.6.1 20110608 (prerelease)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> re.compile('\(([^)]*)\)').search("transparent url(http://www.google.com/chart?chs=630x100&cht=bvs&chxt=x&chxl=0:%7C1840%7C1860%7C1880%7C1900%7C1920%7C1940%7C1960%7C1980%7C2000%7C&chxr=0,0,100&chxs=0,676767,11.3000002,0,tl,676767,676767&chd=e:D9AACPFjGWAAGDLfCeFgBvHLLSCZGED5GOKwDKCxJmF2FwFfERFwEZGcEJHlENJDJ9I0HQDjE-MAK2J9NMI9IAFtNaIOKtGoG2IYKBFvLEJmMLHdIFHXG.IPHrK2I9ULROI8SfHRFTeCIrQPOwXgPHVxQkbCbhg8iDwIvKkety..AAAAAAAA&chbh=7,0,0&chg=11.11,0,5,6&chxp=0,0.0,11.1,22.2,33.3,44.4,55.6,66.7,77.8,88.9&chco=3366cc,bbcced&chm=R,bbbbbb,0,0.9954,1.0%7Ch,bbbbbb,0,1.0,1.0,1&chxs=0,000000,11,-1&hl=en)").groups()[0]
'http://www.google.com/chart?chs=630x100&cht=bvs&chxt=x&chxl=0:%7C1840%7C1860%7C1880%7C1900%7C1920%7C1940%7C1960%7C1980%7C2000%7C&chxr=0,0,100&chxs=0,676767,11.3000002,0,tl,676767,676767&chd=e:D9AACPFjGWAAGDLfCeFgBvHLLSCZGED5GOKwDKCxJmF2FwFfERFwEZGcEJHlENJDJ9I0HQDjE-MAK2J9NMI9IAFtNaIOKtGoG2IYKBFvLEJmMLHdIFHXG.IPHrK2I9ULROI8SfHRFTeCIrQPOwXgPHVxQkbCbhg8iDwIvKkety..AAAAAAAA&chbh=7,0,0&chg=11.11,0,5,6&chxp=0,0.0,11.1,22.2,33.3,44.4,55.6,66.7,77.8,88.9&chco=3366cc,bbcced&chm=R,bbbbbb,0,0.9954,1.0%7Ch,bbbbbb,0,1.0,1.0,1&chxs=0,000000,11,-1&hl=en'