将条件条件应用于数据框列

时间:2020-10-01 17:22:40

标签: python pandas

我正在尝试在适用于数据框的python脚本(涉及熊猫)中复制一个Case语句,并根据每行的处理方式填充新列,但似乎每一行都处于else条件由于新列中的每个值为Other。我首先想到的是,它确实符合我使用的any()条件,但是我觉得我可能会完全使用错误的方法。关于我应该采取的方向有什么建议吗?

示例行:

index | source_name
1 | CLICK TO CALL - New Mexico
2 | Las Vegas Community Partner
3 | Facebook - Test Camp - Los Angeles
4 | Google - Test Camp - Los Angeles

index | landing_page_url
1 | NaN
2 | https://lp.example.com/fb/la/test/
3 | https://lp.example.com/fb/la/test/?utm_source=facebook
4 | https://lp.example.com/google/la/test/?utm_source=google

代码标准:

# Criteria
fb_landing_page_crit = [
    'utm_source=facebook', 
    'fbclid',
    'test.com/fb/'
]
fb_source_crit = [
    'fb',
    'facebook'
]
google_landing_page_crit = [
    'gclid'
]
google_source_crit = [
    'click to call',
    'discovery',
    'call',
    'website',
    'landing page',
    'display - lp'
]
local_listings_source_crit = [
    'gmb'
]
partner_source_crit = [
    'vegas community',
    'new orleans community',
    'dc community',
]

视情况而定:

def network_parse(df):
    if isinstance(df, str):
        if any(x in df['landing_page_url'] for x in fb_landing_page_crit):
            return 'Facebook'
        elif any(x in df['landing_page_url'] for x in google_landing_page_crit):
            return 'Google'
        elif any(x in df['source_name'] for x in fb_source_crit):
            return 'Facebook'
        elif any(x in df['source_name'] for x in google_source_crit):
            return 'Google'
        elif any(x in df['source_name'] for x in local_listings_source_crit):
            return 'Local Listings'
        elif any(x in df['source_name'] for x in partner_source_crit):
            return 'Partner - Community Partnership'
        else:
            return 'Other'
    else:
        return 'Other'

函数调用:

df['network'] = df.apply(network_parse, axis=1) # Every row returns "Other"

2 个答案:

答案 0 :(得分:0)

我找到了解决该问题的更好方法。我决定不使用contains方法,而是运行正则表达式搜索以查看是否在列行中找到了组合列表值,如果存在,则应用该值。以下是我的更新:

列表:

fb_landing_page_crit = [
    'utm_source=facebook', 
    'fbclid',
    'test.com\/fb\/'
]
fb_landing_page_regex = "|".join(fb_landing_page_crit)

google_landing_page_crit = [
    'gclid'
]
google_landing_page_regex = "|".join(google_landing_page_crit)

fb_source_crit = [
    'fb',
    'facebook'
]
fb_source_regex = "|".join(fb_source_crit)

google_source_crit = [
    'click to call',
    'discovery',
    'call',
    'website',
    'landing page',
    'display \- lp'
]
google_source_regex = "|".join(google_source_crit)

local_listings_source_crit = [
    'gmb'
]
local_listings_source_regex = "|".join(local_listings_source_crit)

partner_source_crit = [
    'vegas community',
    'new orleans community',
    'dc community',
]
partner_source_regex =  "|".join(partner_source_crit)

功能:

def network_parse(df):
    if isinstance(df['landing_page_url'], str):
        if bool(re.search(fb_landing_page_regex,df['landing_page_url'].lower())) or bool(re.search(fb_source_regex,df['source_name'].lower())):
            return 'Facebook'
        if bool(re.search(google_landing_page_regex,df['landing_page_url'].lower())) or bool(re.search(google_source_regex,df['source_name'].lower())):
            return 'Google'
        if bool(re.search(local_listings_source_regex,df['source_name'].lower())):
            return 'Local Listings'
        if bool(re.search(partner_source_regex,df['source_name'].lower())):
            return 'Partner - Community Partnership'
        else:
            return 'Other'
    else:
        return 'Other'

函数调用:

df['network'] = df.apply(network_parse, axis=1)

答案 1 :(得分:-1)

现在,问题不是@RunWith(PowerMockRunner.class)而是any部分(我选了x in df['source_name']是因为它在这里更容易解释)。您检查数据框的任何行是否与(例如)source_name 相等,而不是是否包含单词。要实现后者,您可以嵌套'Google'语句:

for

但是,我很确定这不是最优雅,最有效的方法,因为它在同一列上循环了多次,但是对于较小的数据帧来说可能没问题。否则,它可能会帮助您找到更有效的解决方案。

编辑:要研究您的问题,您可以运行以下两个代码段,其中第一个给出... if any((x in y for y in df['landing_page_url']) for x in fb_landing_page_crit): return 'Facebook' ,第二个给出False

True