如何使用.apply和用户定义的函数在pandas df中创建列

时间:2019-07-18 15:44:36

标签: python pandas apply

我试图一次在pandas DataFrame中创建几个列,其中每个列名都是字典中的一个键,并且如果存在与该键对应的任何值,则该函数返回1。

我的DataFrame具有3列jp_ref,jp_title和jp_description。本质上,我正在jp_descriptions中搜索分配给该键的相关单词,并根据jp_description中是否存在任何值,用1和0填充分配给该键的列。


jp_tile = [‘software developer’, ‘operations analyst’, ‘it project manager’]

jp_ref = [‘j01’, ‘j02’, ‘j03’]

jp_description = [‘software developer with java and sql experience’, ‘operations analyst with ms in operations research, statistics or related field. sql experience desired.’, ‘it project manager with javascript working knowledge’]

myDict = {‘jp_title’:jp_title, ‘jp_ref’:jp_ref, ‘jp_description’:jp_description}

data = pd.DataFrame(myDict)

technologies = {'java':['java','jdbc','jms','jconsole','jprobe','jax','jax-rs','kotlin','jdk'],
'javascript':['javascript','js','node','node.js','mustache.js','handlebar.js','express','angular'
             'angular.js','react.js','angularjs','jquery','backbone.js','d3'],
'sql':['sql','mysql','sqlite','t-sql','postgre','postgresql','db','etl']}

def term_search(doc,tech):
    for term in technologies[tech]:
        if term in doc:
            return 1
        else:
            return 0

for tech in technologies:
    data[tech] = data.apply(term_search(data['jp_description'],tech))

我收到以下错误,但不理解:

TypeError: ("'int' object is not callable", 'occurred at index jp_ref')

1 个答案:

答案 0 :(得分:1)

您的逻辑是错误的,您正在循环遍历列表,并且在第一次迭代后它返回0或1,因此永远不会将jp_description的值与完整列表进行比较。

您拆分了jp_description并使用技术命令检查了公共元素,如果存在公共元素,则意味着找到了子字符串,因此返回1,否则返回0

def term_search(doc,tech):
    doc = doc.split(" ")
    common_elem = list(set(doc).intersection(technologies[tech]))
    if len(common_elem)>0:
        return 1
    return 0       

for tech in technologies:
    data[tech] = data['jp_description'].apply(lambda x : term_search(x,tech))
     jp_title          jp_ref  jp_description   java    javascript  sql
0   software developer  j01 software developer....  1          0        1
1   operations analyst  j02 operations analyst ..   0          0        1
2   it project manager  j03 it project manager...   0          1        0