需要使用python

时间:2019-07-11 06:54:56

标签: python-3.x

我正在上传简历并以文本格式存储内容。我正在尝试显示该简历的经验/专业摘要,并以json格式返回。

当我使用下面的代码时,它只给我最近的职位资料,或者说顶部的职位资料。如何获取完整的数据。你能告诉我我在哪里想念吗?

例如:我的简历经验如下:

选择的体验

数据科学家数据创新实验室@首都一号(加利福尼亚州,旧金山)06/15-现在领导研究团队对具有班级不平衡和对手的欺诈问题空间进行建模; H2O.ai,GraphLab,SKLearn部署的亚毫秒级实时模型; Apache Apex建议在Capital One普遍采用的分布式机器学习框架; H2O.ai,GraphLab,Apache Spark

各种技术职位Lawfty(加利福尼亚州旧金山),RevUp软件(加利福尼亚州雷德伍德城),Perkins + Will Architecture(加利福尼亚州旧金山)和Lawrence Berkeley国家实验室。 (加利福尼亚州伯克利)

前端主管Home Depot Pro(加利福尼亚州科尔马)05/11-03/13担任的职位:出纳员,特殊服务员,工具租赁员。对10-30名团队成员进行监督和培训

下面的代码仅给出候选人的最高体验:

def extract_experience(resume_text):
    '''
    Helper function to extract experience from resume text
    :param resume_text: Plain resume text
    :return: list of experience
    '''
    wordnet_lemmatizer = WordNetLemmatizer()
    stop_words = set(stopwords.words('english'))
    # word tokenization 
    word_tokens = nltk.word_tokenize(resume_text)
    # remove stop words and lemmatize  
    filtered_sentence = [w for w in word_tokens if not w in stop_words and wordnet_lemmatizer.lemmatize(w) not in stop_words] 
    sent = nltk.pos_tag(filtered_sentence)
    #print(filtered_sentence)
    # parse regex
    cp = nltk.RegexpParser('P: {<NNP|CD|NNPS|VBG|NN|JJ|CC>+}')
    cs = cp.parse(sent)
    #print(cs)
    test = []
    for vp in list(cs.subtrees(filter=lambda x: x.label()=='P')):
        test.append(" ".join([i[0] for i in vp.leaves() if len(vp.leaves()) >= 2]))
    # Search the word 'experience' in the chunk and then print out the text after it
    x = [x[x.lower().index('experience') + 11:] for i, x in enumerate(test) if x and 'experience' in x.lower()]
    return x

当前输出:

"experience": [
      "Data Scientist Data Innovation Lab @ Capital One"
    ], 

预期输出:

"experience": [
      "Data Scientist Data Innovation Lab @ Capital One (San Francisco, Ca.)   06/15 - Now Lead research team modeling for fraud problem space with class imbalance & adversaries; H2O.ai, GraphLab, SKLearn Deployed sub-millisecond real time model; Apache Apex Recommended distributed machine learning frameworks for general adoption at Capital One; H2O.ai, GraphLab, Apache Spark ",

"Various Technical Positions Lawfty (San Francisco, Ca.), RevUp Software (Redwood City, Ca.), Perkins + Will Architecture (San Francisco, Ca.) & Lawrence Berkeley National Lab. (Berkeley, Ca.)",

"Front End Supervisor The Home Depot Pro (Colma, Ca.)                                        05/11 - 03/13 Positions held: Cashier, Special Services Assoc., Tool Rental Assoc. Supervised and trained a staff of 10-30 team members
" 
    ],

0 个答案:

没有答案