Question

我有一个原始数据文件，格式如下，有多行：

NAME: Jack Age : 25   skill : c++ designation : Analyst other comments:this 
is basic info

NAME : Kattie Age: 45 skill: python  designation: director Other Comments: name : Jane Kattie

我希望输出为：

    name    age skill   designation  other_Comments      name_2 
0   Jack    25  c++     analyst      This is basic Info  NA
1   Kattie  45  python  Director      NA                 Jane Kattie

我尝试使用下面的代码，但无法处理特殊情况，如第2行，我是python的新手，请建议如果有更好的方法，关键词是明确的值集，但可能重复多次。

代码：

file =pd.read_excel('mydata.xlsx', sheetname="Sheet1", header=None)
file.columns =['data']

for i in range(0,len(file)):
     x=file[file.columns.values [0]][i]  
     name= re.findall(r'Name:(.*?)Age',x)
     Age= re.findall(r'Age(.*?) skill',x)
     skills= re.findall(r'skill(.*?)designation',x)
     other_Comments = re.findall(r'other comments(.*?),x)
     file['Name'][i] = name
     file['Age'][i] = Age
     file['Skill'][i] = skills
     file ['Other_Comments'][i] = other_Comments

Answer 1

Python有一个单独的模块来处理csv文件：

导入csv

有关如何使用它的更多信息，我建议您访问python.org网站。在那里，您将找到有关如何使用它的所有信息。

Python从文件中读取字符串并将其拆分为列名和值

1 个答案: