我正在尝试使用两个数据帧:
df1 = df.copy()
df1['emails'] = df1.emails.apply(lambda x: ','.join(set(map(str.strip, x.split(','))) - set(blacklisted.email)))
df1 = df1[df1.emails != '']
当我自己创建具有相同信息的数据帧时,它返回相同的数据类型;例如,如果我创建一个如下所示的数据框:
blacklisted=pd.DataFrame(columns=['email'],
data=[['smith.john@hotmail.com'],['earl.bob@jpmorgan.com'],['banana.star@csu.edu'], ['london.flag@wholefoods.com'],
['soft.pretzel@utz.com']])
blacklisted.head()
email
0 smith.john@hotmail.com
1 earl.bob@jpmorgan.com
2 banana.star@csu.edu
3 london.flag@wholefoods.com
4 soft.pretzel@utz.com
和另一个看起来像这样的数据框:
df=pd.DataFrame(columns=['customerId','full name','emails'],
data=[['208863338', 'Brit Spear', 'star.shine@cw.com'],['086423367', 'Justin Bob', 'bob.love@gem.com,ruby.blue@yahoo.com'],['902626998', 'White Ice', 'iceblue@starr.com,ice@msn.com'], ['1000826799', 'Bear Lou', 'lou.bear@visa.com'],
['1609813339', 'Ariel Do', 'ariel.d@fire.com, ariel@yahoo.com']])
print(df)
customerId full name emails
0 208863338 Brit Spear star.shine@cw.com
1 086423367 Justin Bob bob.love@gem.com,ruby.blue@yahoo.com
2 902626998 White Ice iceblue@starr.com,ice@msn.com
3 1000826799 Bear Lou lou.bear@visa.com
4 1609813339 Ariel Do ariel.d@fire.com, ariel@yahoo.com
上面的代码有效但当我尝试从两个文件调用相同的信息而不是使用这样的代码:
blacklisted = df1 = pd.read_excel(r'C:/Users/Administrator/Documents/sfiq/blacklisted.xlsx')
df = pd.read_excel(r'C:/Users/Administrator/Documents/customers.xlsx')
与我在上面创建的两个数据帧完全相同的信息它不起作用,我得到一个属性错误:
df1['emails'] = df1.emails.apply(lambda x: ','.join(set(map(str.strip, x.split(','))) - set(blacklisted.email)))
返回的错误是:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-22-439d1f152f33> in <module>()
----> 1 df1['emails'] = df1.emails.apply(lambda x: ','.join(set(map(str.strip, x.split(','))) - set(blacklisted.email)))
C:\Program Files\Anaconda3\lib\site-packages\pandas\core\series.py in apply(self, func, convert_dtype, args, **kwds)
2218 else:
2219 values = self.asobject
-> 2220 mapped = lib.map_infer(values, f, convert=convert_dtype)
2221
2222 if len(mapped) and isinstance(mapped[0], Series):
pandas\src\inference.pyx in pandas.lib.map_infer (pandas\lib.c:62658)()
<ipython-input-22-439d1f152f33> in <lambda>(x)
----> 1 df1['emails'] = df1.emails.apply(lambda x: ','.join(set(map(str.strip, x.split(','))) - set(blacklisted.email)))
AttributeError: 'float' object has no attribute 'split'
答案 0 :(得分:1)
假设你有:
在blacklisted.xlsx
:
在customers.xlsx
:
在应用此功能之前使用astype
:
blacklisted = pd.read_excel(r'blacklisted.xlsx')
df = pd.read_excel(r'customers.xlsx')
df['emails'] = df.emails.astype(str).apply(lambda x: ','.join(set(map(str.strip, x.split(','))) - set(blacklisted.email)))
df
df
将是:
customerId full name emails
0 208863338 Brit Spear star.shine@cw.com
1 86423367 Justin Bob ruby.blue@yahoo.com,bob.love@gem.com
2 902626998 White Ice ice@msn.com,iceblue@starr.com
3 1000826799 Bear Lou lou.bear@visa.com
4 1609813339 Ariel Do ariel@yahoo.com,ariel.d@fire.com