1. 我有一个名为rexp.txt
的文件,内容如下:
adf fdsf hh h fg h 1995-11-23
dasvsbh 2000-04-12 gnym,mnbv 2001-02-17
dascvfbsn
bjhmndgfh
xgfdjnfhm244-44-2255 fgfdsg gfjhkh
fsgfdh 455-44-6577 dkjgjfkld
sgf
dgfdhj
sdg 192.6.8.02 fdhdlk dfnfghr
fisdhfih dfhghihg 154.56.2.6 fdhusdgv
aff fjhgdf
fdfdnfjgkpg
fdf hgj fdnbk gjdhgj
dfdfg raeh95@gmail.com efhidhg fdfuga reg@gmail.com
ergudfi rey@gmail.com iugftudfh dgufidjfdg
teeeee@gmail.comugfuhlfhs fgufif p
2.我想逐行提取ssn号码,日期,电子邮件。我希望代码循环遍历每一行并返回预期的字符串。
3. 纠正Python中的编码:
import re
def cfor_date(str):
t=re.search(r'(\d{4}-\d{2}-\d{2})',str)
return t
def cfor_ssn(str):
f=re.search(r'(\d{3}-\d{2}-\d{4})',str)
return f
def cfor_gm(str):
g=re.search(r'([\w\.-]+@gmail[\w{3}\.-]+)',str)
return g
f = open("rexp.txt","r").read()
lines = f.splitlines()
for line in iter(lines):
x=line.split(" ")
print x
if (cfor_date(x)) != None: # i feel problem here
r=cfor_ssn(x)
print r
答案 0 :(得分:1)
splitlines()
将读取的内容拆分为列表,然后迭代该列表。过长而复杂的过程。此外,文件在读取后不会关闭。with
构造打开文件,然后使用readlines()
完全读取文件。无需拆分行,也无需担心关闭文件。search
函数替换为findall
函数。两者之间的差异在下面的例子中解释。>>> import re >>> a = "Dinesh 123" >>> t = re.search(r"\d+",a) >>> t <_sre.SRE_Match object at 0x01FE3918> >>> t.group() >>> '123' >>> x = re.findall(r'\d+',a) >>> x >>> ['123']
如需更多帮助,请查看此link !!!
以上所有要点均出现在以下代码中:
<强>代码:强>
import re
def cfor_date(tmp_line):
t=re.findall(r'(\d{4}-\d{2}-\d{2})',tmp_line)
return t
def cfor_ssn(tmp_line):
f=re.findall(r'(\d{3}-\d{2}-\d{4})',tmp_line)
return f
def cfor_gm(tmp_line):
g=re.findall(r'([\w\.-]+@gmail[\w{3}\.-]+)',tmp_line)
return g
with open("xyz.txt","r") as fh:
for line in fh.readlines():
date_list = cfor_date(line)
ssn_list = cfor_ssn(line)
gm_list = cfor_gm(line)
if len(ssn_list) != 0:
print ssn_list
if len(date_list) != 0:
print date_list
if len(gm_list) != 0 :
print gm_list
输出
C:\Users\dinesh_pundkar\Desktop>python c.py
['1995-11-23']
['2000-04-12', '2001-02-17']
['244-44-2255']
['455-44-6577']
['raeh95@gmail.com', 'reg@gmail.com']
['rey@gmail.com']
['teeeee@gmail.comugfuhlfhs']
C:\Users\dinesh_pundkar\Desktop>