我想检测正确的电子邮件地址,但是我的代码给了我数据集中具有较大概率的标签,显然,它不能按我预期的那样工作。
代码如下:
from textblob.classifiers import NaiveBayesClassifier
files = [
("data_train/email_positive.txt", "yes"),
("data_train/email_negative.txt", "no")
]
train = []; cl = None
for file_txt in files:
email_train_raw = []
with open(file_txt[0]) as f:
email_train_raw = f.readlines()
for email in email_train_raw:
e = email.replace("\n", "")
train.append( (e, file_txt[1]) )
cl = NaiveBayesClassifier(train)
print cl.classify("wrong_email@2x.png")
# Output: yes
# it would be: "no"
一些正确的电子邮件数据集:
hello@3commerceinc.com
sales@ablefreight.com
dispatchwaycross@absolutewl.com
ops@absolutewl.com
tol@absolutewl.com
email@gmail.com
email@hotmail.com
. . .
一些不正确的电子邮件数据集:
pause@2x.png
video@2x.png
right@2x.png
play@2x.png
circle-hover@2x.png
preloader@2x.gif
left@2x.png
circle@2x.png
. . .