我正在尝试用Python处理很多简历。简历的示例可能如下所示。不幸的是,每个简历可能不会使用相同的格式。除了使用正则表达式从简历中提取某些字段(假设我将所有字段都转换为纯文本)并使用python之外,有没有一种方法可以做到这一点?
Name: Someone
Tel: xxx-xxxxxxx
Add: 123 Some Street
Email: Someone@gmail.com
Objective/Goal
To obtain a position in...
Education
2004 - 2006: University of XYZ
Work Experience
2006 - 2008: Programmer
Skills
Programming skills: Python, ..
假设我只想在那里提取一些字段,如何获得字段名称和下一个字段之间的所有文本?例如,我只想获得名称和工作经验字段,它应返回以下内容。
NameField = 'Someone'
WorkExpField = '2006 - 2008: Programmer...'
答案 0 :(得分:3)
我的“我要尝试这个,但是懒得做漂亮”的方法用于不同格式的简历。我愿意用不同的简历格式来测试它。 欢迎提供其他建议/意见!
import string
class Resume():
def __init__(self,filename):
self.filepath = filename
self.load()
self.parse()
def load(self):
with open(self.filepath,'rb') as f:
self.content = f.read().splitlines()
def checkLine(self,word,value, content, line):
if word in content.lower():
value = self.addValue(value,line)
return value
def addValue(self,value,line):
value[line] = value.get(line,0) + 1
return value
def dict_List(self,dict_, content):
new = [(key,value) for key,value in dict_.items() if dict_[key] == max(dict_.values())]
return [(x[0],content[x[0]]) for x in sorted(new)]
def get_name(self):
names = []
for each in self.name:
if each[0] not in self.headings:
each = each[1].replace('Name',"")
if each[0] not in string.letters:
each = each[1:]
names.append(each.strip())
else:
index = self.headings[self.headings.index(each[0])+1]
names.append("\n".join(self.content[each[0]+1:index]))
if len(names) == 1:
return names[0]
else:
return names
def get_work(self):
experience = []
for each in self.work:
index = self.headings[self.headings.index(each[0])+1]
experience.append("\n".join(self.content[each[0]+1:index]))
if len(experience) == 1:
return experience[0]
else:
return epxerience
def parse(self):
name = dict()
work_experience = dict()
isHeading = dict()
for line_num in range(len(self.content)):
for checkName in ["name",":"]:
name.update(self.checkLine(checkName,name,self.content[line_num], line_num))
for checkWork in ["work","experience"]:
work_experience.update(self.checkLine(checkWork,work_experience, self.content[line_num],line_num))
if line_num != len(self.content) - 1:
if len(self.content[line_num + 1]) > len(self.content[line_num]):
isHeading.update(self.addValue(isHeading,line_num))
if line_num > 0:
if self.content[line_num - 1] == "":
isHeading.update(self.addValue(isHeading,line_num))
if len(self.content[line_num]) == len(self.content[line_num].lstrip()):
isHeading.update(self.addValue(isHeading,line_num))
if self.content[line_num] == "":
isHeading[line_num] = isHeading.get(line_num,0) - 1
self.name = self.dict_List(name, self.content)
self.work = self.dict_List(work_experience, self.content)
self.headings = self.dict_List(isHeading, self.content)
self.headings = [x[0] for x in self.headings]
if __name__ == "__main__":
resume = Resume(filename = 'sampleresume.txt')
print resume.get_name()
print resume.get_work()
收率:
Someone
2006 - 2008: Programmer
答案 1 :(得分:0)
您应该查看regexp s。
它们允许您解析文本。
一个例子:
#!/usr/local/bin/python2.7
import re
prog = re.compile("\s*(Name|name|nick).*")
result = prog.match("Name: Bob Exampleson")
if result:
print result.group(0)
result = prog.match("University: MIT")
if result:
print result.group(0)
根据候选人使用的不同文本,您必须优化搜索。