我正在尝试在Python中将文本文件转换为CSV 输入的文本文件如下:
Employee Name: Dr.john doe
Designation: Professor
Email: johndoe@google.com
"ContactNo: 1234567, 9999999"
"Qualification: M.Tech., Ph.D."
Area of Interest / Specialisation: network security
Employee Name: Dr. john doe2
Designation: Professor2
Email: johndoe2@google.com
ContactNo: 222222222
"Qualification: B.Tech., Ph.D."
Area of Interest / Specialisation: network security2
Employee Name: Dr. john doe3
Designation: Associate Professor3
Email: johndoe3@google.com
"ContactNo: 333333,4444444"
Qualification: Ph.D.
Area of Interest / Specialisation: network security3
Designation: Associate Professor4
Email: johndoe4@google.com
"ContactNo: 44444444 ,Intercom No.44444"
Qualification: : M.Sc.
Designation: Programmer
Email: johndoe5@google.com
"ContactNo: 5555555555 ,Intercom No.5555"
Qualification: Ph.D |Computer Science
Designation: Computer Operator
Email: johndoe6@google.com
ContactNo: 666666666
"Qualification: D.C.Sc. & E.,"
Designation: Computer Operator
Email: johndoe7@google.com
"ContactNo: 777777777 ,Intercom No.77777<"
"Qualification: D.E & TC.,"
Designation: Instructor4
Email: johndoe8@google.com
"ContactNo: 8888888888 ,Intercom No.8888"
"Qualification: D.C.Sc. & E.,"`
我需要以下格式的CSV文件(如您所见,只能获取一个字段的多个值之一,并且有些数据没有雇员名称,需要在输出CSV文件中将其排除):< / p>
name,designation,email,contact,Qualification,Specialisation
Dr. john doe,Professor,johndoe@google.com,1234567,B.E.,network security
Dr. john doe2,Professor,johndoe2@google.com,222222222,M.S.,network security2
Dr. john doe3,Associate,Professor3,johndoe3@gmail.com,333333,M.Tech.,network security3
**我尝试了各种方法,但是我做不到(我对编程很陌生):
我已经使用其他人的例子进行了尝试,但是我认为我的问题需要使用不同的方法:
records = """Employee Name: Dr. john doe
Designation: Professor
Email: johndoe@google.com
ContactNo: 1234567, 9999999
Qualification: M.Tech., Ph.D.
Area of Interest / Specialisation: network security"""
for record in records.split('Employee Name'):
fields = record.split('\n')
Employee_Name = "NA"
Designation = "NA"
ContactNo = "NA"
Qualification = "NA"
Specialization = "NA"
for field in fields:
field_name, field_value = field.split(':')
if field_name == "": # This is employee name, since we split on it
Employee_Name = field_value
if field_name == "Designation":
Designation = field_value
if field_name == "ContactNo":
ContactNo = field_value
if field_name == "Qualification":
Qualification = field_value
if field_name == "Specialization":
Specialization = field_value
这是我在这里的第一个问题,因此请忽略该问题中的任何格式错误(如果有任何不当之处,请不要提出这个问题,我会立即对其进行更新)
答案 0 :(得分:1)
如果在代码的不同位置添加打印语句,则会发现有时record=''
和有时field=''
。
添加几行:
for record in records.split('Employee Name'):
if record == '':
continue
fields = record.split('\n')
和
for field in fields:
if field == '':
continue
field_name, field_value = field.split(':')
现在应该成功运行。
答案 1 :(得分:0)
数据
Employee Name: Dr.john doe
Designation: Professor
Email: johndoe@google.com
"ContactNo: 1234567, 9999999"
"Qualification: M.Tech., Ph.D."
Area of Interest / Specialisation: network security
Employee Name: Dr. john doe2
Designation: Professor2
Email: johndoe2@google.com
ContactNo: 222222222
"Qualification: B.Tech., Ph.D."
Area of Interest / Specialisation: network security2
Employee Name: Dr. john doe3
Designation: Associate Professor3
Email: johndoe3@google.com
"ContactNo: 333333,4444444"
Qualification: Ph.D.
Area of Interest / Specialisation: network security3
这是简单方法,如果有很多列 (无需为每个字段编写代码) < / p>
解决方案:
import pandas as pd
tdf = pd.read_csv("D:/emp.txt",sep='\n',doublequote=False, header= None)
tdf = tdf[0].str.split(':', expand=True)
dd = tdf.groupby(0)[1].apply(lambda g: g.values.tolist()).to_dict()
df = pd.DataFrame.from_dict(dd)
# If you want to re-arrange the columns (Optional)
df = df[['Employee Name','Designation','Email','ContactNo','Qualification','Area of Interest / Specialisation']]
df.to_csv('D:/EMP.csv',index=False) #Save results in CSV format
df
Employee Name Designation Email ContactNo Qualification Area of Interest / Specialisation
0 Dr.john doe Professor johndoe@google.com 1234567, 9999999 M.Tech., Ph.D. network security
1 Dr. john doe2 Professor2 johndoe2@google.com 222222222 B.Tech., Ph.D. network security2
2 Dr. john doe3 Associate Professor3 johndoe3@google.com 333333,4444444 Ph.D. network security3