我正在使用以下代码解析Outlook消息:
email_content = str(message.Body)
lines_stripped = [line.strip() for line in email_content.split('\r\n') if line.strip() != '']
for line in lines_stripped:
writer = csv.writer(write_file, delimiter=" ")
writer.writerow(line.split())
CSV文件如下所示:
Car: Mazda
Color: Green
Comment: A very nice Car
Car: Toyota
Color: Black
Comment: Okay car
我想这样改变:
Car Color Comment
Mazda Green A very nice Car
Toyota Black Okay car
答案 0 :(得分:4)
我会使用以下split_at
模式在纯python中完成大部分操作:
In [11]: def split_at(lst, f):
...: inds = [i for i, x in enumerate(lst) if f(x)]
...: for i, j in zip(inds, inds[1:]):
...: yield lst[i:j]
...: yield lst[j:]
...:
可让您拆分属性列表:
In [12]: cars = [c.split(": ", 1) for c in cars.splitlines() if c]
In [13]: cars
Out[13]:
[['Car', 'Mazda'],
['Color', 'Green'],
['Comment', 'A very nice Car'],
['Car', 'Toyota'],
['Color', 'Black'],
['Comment', 'Okay car']]
In [14]: pd.DataFrame([dict(c) for c in split_at(cars, lambda x: x[0] == "Car")])
Out[14]:
Car Color Comment
0 Mazda Green A very nice Car
1 Toyota Black Okay car
答案 1 :(得分:2)
##data
temp = StringIO("""
Car: Mazda
Color: Green
Comment: A very nice Car
Car: Toyota
Color: Black
Comment: Okay car""")
df = pd.read_csv(temp, sep=':', engine='python', header=None)
df.columns = ['A','B']
##print(df)
A B
0 Car Mazda
1 Color Green
2 Comment A very nice Car
3 Car Toyota
4 Color Black
5 Comment Okay car
使用pd.pivot
并使用sorted
(键为空)
pd.pivot(index=df.index, columns=df.A, values=df.B).apply(sorted,key=pd.isnull).dropna()
输出
A Car Color Comment
0 Mazda Green A very nice Car
1 Toyota Black Okay car
答案 2 :(得分:1)
这应该有效:
import numpy as np
import pandas as pd
import io
temp = '''
Car: Mazda
Color: Green
Comment: A very nice Car
Car: Toyota
Color: Black
Comment: Okay car
'''
input_csv = io.StringIO(temp)
#input_csv = 'hello.csv'
df = pd.read_csv(input_csv, sep=":", skip_blank_lines=True,header=None)
data = np.array_split(df[1].to_numpy(), len(df)/3)
df2 = pd.DataFrame(data, columns=df[0].unique())
print(df2)
Car Color Comment
0 Mazda Green A very nice Car
1 Toyota Black Okay car
使用纯python +熊猫
cars = []
colors = []
comments = []
lines = io.StringIO(temp).readlines()
for line in lines:
if line.startswith('Car'):
cars.append(line.split(':')[1].strip())
if line.startswith('Color'):
colors.append(line.split(':')[1].strip())
if line.startswith('Comment'):
comments.append(line.split(':')[1].strip())
df = pd.DataFrame({'car': cars, 'color': colors, 'comment': comments})
df