我正在尝试在适用于数据框的python脚本(涉及熊猫)中复制一个Case语句,并根据每行的处理方式填充新列,但似乎每一行都处于else条件由于新列中的每个值为Other
。我首先想到的是,它确实符合我使用的any()
条件,但是我觉得我可能会完全使用错误的方法。关于我应该采取的方向有什么建议吗?
示例行:
index | source_name
1 | CLICK TO CALL - New Mexico
2 | Las Vegas Community Partner
3 | Facebook - Test Camp - Los Angeles
4 | Google - Test Camp - Los Angeles
index | landing_page_url
1 | NaN
2 | https://lp.example.com/fb/la/test/
3 | https://lp.example.com/fb/la/test/?utm_source=facebook
4 | https://lp.example.com/google/la/test/?utm_source=google
代码标准:
# Criteria
fb_landing_page_crit = [
'utm_source=facebook',
'fbclid',
'test.com/fb/'
]
fb_source_crit = [
'fb',
'facebook'
]
google_landing_page_crit = [
'gclid'
]
google_source_crit = [
'click to call',
'discovery',
'call',
'website',
'landing page',
'display - lp'
]
local_listings_source_crit = [
'gmb'
]
partner_source_crit = [
'vegas community',
'new orleans community',
'dc community',
]
视情况而定:
def network_parse(df):
if isinstance(df, str):
if any(x in df['landing_page_url'] for x in fb_landing_page_crit):
return 'Facebook'
elif any(x in df['landing_page_url'] for x in google_landing_page_crit):
return 'Google'
elif any(x in df['source_name'] for x in fb_source_crit):
return 'Facebook'
elif any(x in df['source_name'] for x in google_source_crit):
return 'Google'
elif any(x in df['source_name'] for x in local_listings_source_crit):
return 'Local Listings'
elif any(x in df['source_name'] for x in partner_source_crit):
return 'Partner - Community Partnership'
else:
return 'Other'
else:
return 'Other'
函数调用:
df['network'] = df.apply(network_parse, axis=1) # Every row returns "Other"
答案 0 :(得分:0)
我找到了解决该问题的更好方法。我决定不使用contains方法,而是运行正则表达式搜索以查看是否在列行中找到了组合列表值,如果存在,则应用该值。以下是我的更新:
列表:
fb_landing_page_crit = [
'utm_source=facebook',
'fbclid',
'test.com\/fb\/'
]
fb_landing_page_regex = "|".join(fb_landing_page_crit)
google_landing_page_crit = [
'gclid'
]
google_landing_page_regex = "|".join(google_landing_page_crit)
fb_source_crit = [
'fb',
'facebook'
]
fb_source_regex = "|".join(fb_source_crit)
google_source_crit = [
'click to call',
'discovery',
'call',
'website',
'landing page',
'display \- lp'
]
google_source_regex = "|".join(google_source_crit)
local_listings_source_crit = [
'gmb'
]
local_listings_source_regex = "|".join(local_listings_source_crit)
partner_source_crit = [
'vegas community',
'new orleans community',
'dc community',
]
partner_source_regex = "|".join(partner_source_crit)
功能:
def network_parse(df):
if isinstance(df['landing_page_url'], str):
if bool(re.search(fb_landing_page_regex,df['landing_page_url'].lower())) or bool(re.search(fb_source_regex,df['source_name'].lower())):
return 'Facebook'
if bool(re.search(google_landing_page_regex,df['landing_page_url'].lower())) or bool(re.search(google_source_regex,df['source_name'].lower())):
return 'Google'
if bool(re.search(local_listings_source_regex,df['source_name'].lower())):
return 'Local Listings'
if bool(re.search(partner_source_regex,df['source_name'].lower())):
return 'Partner - Community Partnership'
else:
return 'Other'
else:
return 'Other'
函数调用:
df['network'] = df.apply(network_parse, axis=1)
答案 1 :(得分:-1)
现在,问题不是@RunWith(PowerMockRunner.class)
而是any
部分(我选了x in df['source_name']
是因为它在这里更容易解释)。您检查数据框的任何行是否与(例如)source_name
相等,而不是是否包含单词。要实现后者,您可以嵌套'Google'
语句:
for
但是,我很确定这不是最优雅,最有效的方法,因为它在同一列上循环了多次,但是对于较小的数据帧来说可能没问题。否则,它可能会帮助您找到更有效的解决方案。
编辑:要研究您的问题,您可以运行以下两个代码段,其中第一个给出...
if any((x in y for y in df['landing_page_url']) for x in fb_landing_page_crit):
return 'Facebook'
,第二个给出False
:
True