如何使用Textblob通过机器学习检测正确的电子邮件地址?

时间:2018-10-14 16:14:34

标签: python machine-learning textblob

我想检测正确的电子邮件地址,但是我的代码给了我数据集中具有较大概率的标签,显然,它不能按我预期的那样工作。

代码如下:

from textblob.classifiers import NaiveBayesClassifier
files = [
  ("data_train/email_positive.txt", "yes"), 
  ("data_train/email_negative.txt", "no")
]
train = []; cl = None

for file_txt in files:   
    email_train_raw = []        
    with open(file_txt[0]) as f: 
        email_train_raw = f.readlines()

    for email in email_train_raw:
        e = email.replace("\n", "")
        train.append( (e, file_txt[1]) )

cl = NaiveBayesClassifier(train)
print cl.classify("wrong_email@2x.png")
# Output: yes 
# it would be: "no"

一些正确的电子邮件数据集:

hello@3commerceinc.com
sales@ablefreight.com
dispatchwaycross@absolutewl.com
ops@absolutewl.com
tol@absolutewl.com
email@gmail.com
email@hotmail.com
. . . 

一些不正确的电子邮件数据集:

pause@2x.png
video@2x.png
right@2x.png
play@2x.png
circle-hover@2x.png
preloader@2x.gif
left@2x.png
circle@2x.png
. . . 

0 个答案:

没有答案