我正在尝试在Python中将文本文件转换为CSV

时间:2019-04-18 19:06:23

标签: python python-3.x export-to-csv

我正在尝试在Python中将文本文件转换为CSV 输入的文本文件如下:

Employee Name: Dr.john doe
Designation: Professor
Email: johndoe@google.com
"ContactNo: 1234567, 9999999"
"Qualification: M.Tech., Ph.D."
Area of Interest / Specialisation: network security
Employee Name: Dr. john doe2 
Designation: Professor2
Email: johndoe2@google.com
ContactNo: 222222222
"Qualification: B.Tech., Ph.D."
Area of Interest / Specialisation: network security2
Employee Name: Dr. john doe3 
Designation: Associate Professor3
Email: johndoe3@google.com
"ContactNo: 333333,4444444"
Qualification: Ph.D.
Area of Interest / Specialisation: network security3
Designation: Associate Professor4
Email: johndoe4@google.com
"ContactNo: 44444444 ,Intercom No.44444"
Qualification: : M.Sc. 
Designation: Programmer
Email: johndoe5@google.com
"ContactNo: 5555555555 ,Intercom No.5555"
Qualification: Ph.D |Computer Science
Designation: Computer Operator
Email: johndoe6@google.com
ContactNo: 666666666
"Qualification: D.C.Sc. & E.,"
Designation: Computer Operator
Email: johndoe7@google.com
"ContactNo: 777777777 ,Intercom No.77777<"
"Qualification: D.E & TC.,"
Designation: Instructor4
Email: johndoe8@google.com
"ContactNo: 8888888888 ,Intercom No.8888"
"Qualification: D.C.Sc. & E.,"`

我需要以下格式的CSV文件(如您所见,只能获取一个字段的多个值之一,并且有些数据没有雇员名称,需要在输出CSV文件中将其排除):< / p>

name,designation,email,contact,Qualification,Specialisation 

Dr. john doe,Professor,johndoe@google.com,1234567,B.E.,network security

Dr. john doe2,Professor,johndoe2@google.com,222222222,M.S.,network security2

Dr. john doe3,Associate,Professor3,johndoe3@gmail.com,333333,M.Tech.,network security3

**我尝试了各种方法,但是我做不到(我对编程很陌生):

我已经使用其他人的例子进行了尝试,但是我认为我的问题需要使用不同的方法:

records = """Employee Name: Dr. john doe
Designation: Professor
Email: johndoe@google.com
ContactNo: 1234567, 9999999
Qualification: M.Tech., Ph.D.
Area of Interest / Specialisation: network security"""

for record in records.split('Employee Name'):
    fields = record.split('\n')
    Employee_Name = "NA"
    Designation = "NA"
    ContactNo = "NA"
    Qualification = "NA"
    Specialization = "NA"
    for field in fields:
        field_name, field_value = field.split(':')
        if field_name == "": # This is employee name, since we split on it
            Employee_Name = field_value
        if field_name == "Designation":
            Designation = field_value
        if field_name == "ContactNo":
            ContactNo = field_value
        if field_name == "Qualification":
            Qualification = field_value
        if field_name == "Specialization":
            Specialization = field_value

这是我在这里的第一个问题,因此请忽略该问题中的任何格式错误(如果有任何不当之处,请不要提出这个问题,我会立即对其进行更新)

2 个答案:

答案 0 :(得分:1)

如果在代码的不同位置添加打印语句,则会发现有时record=''和有时field=''

添加几行:

for record in records.split('Employee Name'):
    if record == '':
        continue
    fields = record.split('\n')

for field in fields:
    if field == '':
        continue
    field_name, field_value = field.split(':')

现在应该成功运行。

答案 1 :(得分:0)

数据

Employee Name: Dr.john doe
Designation: Professor
Email: johndoe@google.com
"ContactNo: 1234567, 9999999"
"Qualification: M.Tech., Ph.D."
Area of Interest / Specialisation: network security
Employee Name: Dr. john doe2 
Designation: Professor2
Email: johndoe2@google.com
ContactNo: 222222222
"Qualification: B.Tech., Ph.D."
Area of Interest / Specialisation: network security2
Employee Name: Dr. john doe3 
Designation: Associate Professor3
Email: johndoe3@google.com
"ContactNo: 333333,4444444"
Qualification: Ph.D.
Area of Interest / Specialisation: network security3

这是简单方法,如果有很多列 (无需为每个字段编写代码) < / p>

解决方案:

import pandas as pd
tdf = pd.read_csv("D:/emp.txt",sep='\n',doublequote=False, header= None)

tdf = tdf[0].str.split(':', expand=True)

dd = tdf.groupby(0)[1].apply(lambda g: g.values.tolist()).to_dict()

df = pd.DataFrame.from_dict(dd)

# If you want to re-arrange the columns (Optional)
df = df[['Employee Name','Designation','Email','ContactNo','Qualification','Area of Interest / Specialisation']]

df.to_csv('D:/EMP.csv',index=False) #Save results in CSV format

df

     Employee Name            Designation                 Email          ContactNo    Qualification Area of Interest / Specialisation
0      Dr.john doe              Professor    johndoe@google.com   1234567, 9999999   M.Tech., Ph.D.                  network security
1   Dr. john doe2              Professor2   johndoe2@google.com          222222222   B.Tech., Ph.D.                 network security2
2   Dr. john doe3    Associate Professor3   johndoe3@google.com     333333,4444444            Ph.D.                 network security3