Question

我目前正在尝试编写一个函数，该函数将提取2个特定字符之间的字符串。

我的数据集仅包含如下电子邮件：pstroulgerrn@time.com。

我正在尝试提取@之后的所有内容和.之前的所有内容，以便上面列出的电子邮件输出time。

到目前为止，这是我的代码：

new = df_personal['email'] # 1000x1 dataframe of emails

def extract_company(x):
        y = [ ]
        y = x[x.find('@')+1 : x.find('.')]
        return y

extract_company(new)

注意：：如果我将new更改为df_personal['email'][0]，则会显示该行的正确输出。

但是，当尝试对整个dataframe执行此操作时，出现一条错误消息：

AttributeError: 'Series' object has no attribute 'find'

Answer 1

您可以使用正则表达式提取一系列所有匹配的文本：

import pandas as pd

df = pd.DataFrame( ['kabawonga@something.whereever','kabawonga@omg.whatever'])
df.columns = ['email']

print(df)

k =  df["email"].str.extract(r"@(.+)\.")

print(k)

输出：

# df
                           email
0  kabawonga@something.whereever
1         kabawonga@omg.whatever

# extraction
           0
0  something
1        omg

请参见pandas.Series.str.extract

Answer 2

尝试：

df_personal["domain"]=df_personal["email"].str.extract(r"\@([^\.]+)\.")

输出（用于示例数据）：

import pandas as pd

df_personal=pd.DataFrame({"email": ["abc@yahoo.com", "xyz.abc@gmail.com", "john.doe@aol.co.uk"]})

df_personal["domain"]=df_personal["email"].str.extract(r"\@([^\.]+)\.")

>>> df_personal

                email domain
0       abc@yahoo.com  yahoo
1   xyz.abc@gmail.com  gmail
2  john.doe@aol.co.uk    aol

Answer 3

您可以使用apply函数来实现此目的，方法是先对每个行分别用.和@进行分割：

摘要：

import pandas as pd

df = pd.DataFrame( ['abc@xyz.dot','def@qwe.dot','def@ert.dot.dot'])
df.columns = ['email']


df["domain"] = df["email"].apply(lambda x: x.split(".")[0].split("@")[1])

输出：

df
Out[37]: 
             email domain
0      abc@xyz.dot    xyz
1      def@qwe.dot    qwe
2  def@ert.dot.dot    ert

Python，为数据帧中的所有行提取两个特定字符之间的字符串

3 个答案: