使用正则表达式和记事本++提取数字

时间:2016-01-09 18:57:58

标签: regex notepad++

我有以下数据(在一行中):

<span id=​"ctb_0" onclick=​"show_hide_box(this)​;​"
class=​"hide_icon r txtfont ltr">​open​</span>​,
<div class=​"description clr ltr txtfont">​…​</div>​,
<font class=​"txtfont ltr">​Rayyan Real Investment​</font>​,
<span class=​"ltr txtfont">​+92-3212459990​</span>​,
<div class=​"description clr ltr txtfont">​…​</div>​,
<font class=​"txtfont ltr">​Bukhari Properties​</font>​,
<span class=​"ltr txtfont">​+92-3218248858​</span>​,
<div class=​"description clr ltr txtfont">​…​</div>​,
<font class=​"txtfont ltr">​Exact Properties​</font>​,
<span class=​"ltr txtfont">​+92-3312044421​</span>​,
<div class=​"description clr ltr txtfont">​…​</div>​,
<font class=​"txtfont ltr">​Exact Properties​</font>​,
<span class=​"ltr txtfont">​+92-3312044421​</span>​,
<div class=​"description clr ltr txtfont">​…​</div>​,
<font class=​"txtfont ltr">​Adeel Corporation​</font>​,
<span class=​"ltr txtfont">​+923008253132​</span>​
<div class=​"description clr ltr txtfont">​…​</div>​,
<font class=​"txtfont ltr">​Adeel Corporation​</font>​,
<span class=​"ltr txtfont">​+92-3008253132​</span>​,
<div class=​"description clr ltr txtfont">​…​</div>​,
<font class=​"txtfont ltr">​Z.S Associates​</font>​,
<span class=​"ltr txtfont">​+92-3452431417​</span>​,
<div class=​"description clr ltr txtfont">​…​</div>​,
<font class=​"txtfont ltr">​Keystone Properties​</font>​,
<span class=​"ltr txtfont">​+92-3353509187/301..​</span>​,
<div class=​"description clr ltr txtfont">​…​</div>​, 
<font class=​"txtfont ltr">​Adeel Corporation​</font>​,
<span class=​"ltr txtfont">​+92-3008253132​</span>​, 
<div class=​"description clr ltr txtfont">​…​</div>​,
<font class=​"txtfont ltr">​Adeel Corporation​</font>​,
<span class=​"ltr txtfont">​+92-3008253132​</span>​,
<div class=​"description clr ltr txtfont">​…​</div>​,
<font class=​"txtfont ltr">​Safeway Real Estate Consultant​</font>​,
<span class=​"ltr txtfont">​+92-3218282885/345..​</span>​,
<div class=​"description clr ltr txtfont">​…​</div>​,
<font class=​"txtfont ltr">​Abdul Sattar & Sons​</font>​,
<span class=​"ltr txtfont">​+92-3332107802, +9..​</span>​,
<div class=​"description clr ltr txtfont">​…​</div>​,
<font class=​"txtfont ltr">​Bismillah Real Estate​</font>​,
<span class=​"ltr txtfont">​+92-3213336525, 03..​</span>​,
<div class=​"description clr ltr txtfont">​…​</div>​,
<font class=​"txtfont ltr">​Aiman Estate & Properties​</font>​,
<span class=​"ltr txtfont">​+92-3212537535​</span>​,
<div class=​"description clr ltr txtfont">​…​</div>​,
<font class=​"txtfont ltr">​Aiman Estate & Properties​</font>​,
<span class=​"ltr txtfont">​+92-3212537535​</span>​,

记事本++ 中使用正则表达式我想要这样:

923008929845 
923318874928​​
923008275080
923452113010​​
923002024486​​
923218286664
923218286664​​
923212804245
923002555091​​
​923212804245
923008289996
​923003579717
923003579717​​
923003772227
923007048836​​

我在记事本++中尝试过以下但是它不干净而且快速。我正在手动删除HTML代码,这阻止我快速完成数据抓取

找到: [a-z] | [A-Z] | [,。()_ =;“+&lt;&gt; /: - ]

替换为:(空格键)

仍然看到很多随机字符

3 个答案:

答案 0 :(得分:1)

怎么样:

找到:^.*?\+(\d\d)-(\d{10}).*?$
替换为:$1$2\n

<强>解释

^           : begining of line
  .*?       : 0 or more any character (not greedy)
  \+        : +, needs to be escaped because it's a special char for regex
  (\d\d)    : 2 digits captured in group 1
  -         : dash
  (\d{10})  : 10 digits captured in group 2
  .*?       : 0 or more any character (not greedy)
$           : end of line

答案 1 :(得分:0)

试试这个。

查找内容:\s.*\s.*?(\d+)-(\d{10})|.+
替换为:$1$2

注意!!”
这是我目前从正则表达式学到的东西,我不擅长
正则表达式,但上面的正则表达式工作正常,除了数字之间留有 2 个空格....

答案 2 :(得分:-1)

我没有notepad ++但是这样的东西会让你大部分都在那里。它匹配所有内容,直到您正在寻找的数字模式的第一次出现结束。并将整个匹配替换为捕获的数字模式和换行符。替换所有应该多次执行。

enter image description here