从文本数据中提取所有电子邮件

时间:2020-06-07 10:45:35

标签: python pandas

我已导入数据文件:

import pandas as pd

em = pd.read_csv(r'C:\Users\hp\Desktop\notepad\film.csv' ,error_bad_lines=False)

该代码无法正常工作,有人有更好的代码吗?

import numpy as np

em['email'] = em['Actors & Actresses Address']
nan_rows = em[em.isnull().any(1)]
em = em.fillna(' ')
nan_rows = em[em.isnull().any(1)]

for word in em:
    new = []
    i = ".com"
    if i in word:
        new.append(word)
        em.to_csv("new.csv", index=False)

print(new)

2 个答案:

答案 0 :(得分:0)

尝试一下:

d = {"Movie": ["Movie1", "Movie2", "Movie3"], "e-mail":["not an e-mail ad","mail@yahoo.com", "mail@gmail.com"]}
df = pd.DataFrame(d)

df["e-mail"][df["e-mail"].apply(lambda x: "@" in x)]

或者:

df["e-mail"][df['e-mail'].str.contains('@')]

答案 1 :(得分:0)

使用extract

尝试一下
em['Actors & Actresses Address'].fillna("").str.extract("([\w_.]+@[\w_.]+.com)")