在Python中过滤字符串

时间:2012-06-15 16:11:00

标签: python error-handling invalid-characters

我正在制作检查字符串(电子邮件)的算法 - 比如“电子邮件地址有效”,但它们是规则。电子邮件的第一部分必须是1-8个字符的字符串(可以包含字母,数字,下划线[_] ...电子邮件包含的所有部分)以及@电子邮件的第二部分之后拥有1-12个字符的字符串(也包含所有合法表达式),它必须以顶级域名结尾.com

email = raw_input ("Enter the e-mail address:")
length = len (email)
if length > 20 
    print "Address is too long"
elif lenght < 7:
    print "Address is too short"  
if not email.endswith (".com"):   
    print "Address doesn't contain correct domain ending"   
try:
    first_part = len (splitting[0])
    second_part = len(splitting[1])  

    account = splitting[0]
    domain = splitting[1] 

    list = "abcdefghijklmopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789_."

    for c in account: 
        if c not in list:
            print "Invalid char", "->", c,"<-", "in account name of e-mail"

    for c in domain:
        if c not in list:
            print "Invalid char", "->", c,"<-", "in domain name of  e-mail"

    if first_part == 0:
        print "You need at least 1 character before the @"
    elif first_part> 8:
        print "The first part is too long"
    if second_part == 4:
        print "You need at least 1 character after the @"
    elif second_part> 16:
        print "The second part is too long"
except IndexError:
        print ""The address must consist of 2 parts connected with symbol @,\
 and ending must be .com"

    if first_part > 0 and first_part < 8 and second_part >4 and second_part < 16:
       print "Valid e-mail address"

3 个答案:

答案 0 :(得分:3)

Regular expressions FTW!

import re

address = 'test@gmail.com'
if re.match(r'^[a-z0-9_]{1,8}@[a-z0-9_]{1,8}\.com$', address, re.IGNORECASE):
  print 'valid'
else:
  print 'invalid'

较短的正则表达式(如注释所示)为r'^\w{1,8}@\w{1,8}\.com$'

我不知道这是否是你老师的目标,但正则表达总是很有用的知识:)

答案 1 :(得分:2)

如果我理解得很好,除了找到无效字符的部分外,你得到了一切。这是真的吗?

你知道for循环吗?它可能对您有所帮助。只需获取电子邮件的部分内容:

account = splitting[0]
domain = splitting[1]

然后,迭代每个部分。它每次都会产生一个角色。如果此字符不在允许的字符集中,则打印一条消息:

for c in account:
    if c not in "abcdefghijklmopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789_.":
        print "Invalid char", c, "in e-mail"

for c in domain:
    if c not in "abcdefghijklmopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789_.":
        print "Invalid char", c, "in e-mail"

这不是一个非常优雅的解决方案(可以使用string.ascii_letters + string.digits +"._"代替"abcdefghijklmopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789_.",或list comprehesions }),但我敢打赌,对于新用户来说这是完全可以理解的。

答案 2 :(得分:1)

构建一个验证电子邮件方法,其中包含两个要验证的电子邮件地址参数和一个有效域列表。

def validEmail(email, domains):

    length = len(email)

    index = email.find('@')

    if len(email[0:index]) <= 8 and len(email[0:index]) > 0:
        dot = email.find('.')
        if (len(email[index + 1:]) - dot) <= 12:
            if index+1==dot:
                return False
            else:
                for i in domains:
                    if email[dot + 1:] == i:
                        return True
    return False

domains = ['com', 'org', 'co.uk']

email = raw_input ("Enter the e-mail address:")
print validEmail(email, domains)