如何使用冒号分隔符从csv创建数据框

时间:2019-07-18 04:45:48

标签: python pandas

我正在使用以下代码解析Outlook消息:

email_content = str(message.Body)
lines_stripped = [line.strip() for line in email_content.split('\r\n') if line.strip() != '']
for line in lines_stripped:
    writer = csv.writer(write_file, delimiter=" ")
    writer.writerow(line.split())

CSV文件如下所示:

Car: Mazda

Color: Green

Comment: A very nice Car

Car: Toyota

Color: Black

Comment: Okay car

我想这样改变:

Car     Color       Comment
Mazda   Green       A very nice Car
Toyota  Black       Okay car

3 个答案:

答案 0 :(得分:4)

我会使用以下split_at模式在纯python中完成大部分操作:

In [11]: def split_at(lst, f):
    ...:     inds = [i for i, x in enumerate(lst) if f(x)]
    ...:     for i, j in zip(inds, inds[1:]):
    ...:         yield lst[i:j]
    ...:     yield lst[j:]
    ...:

可让您拆分属性列表:

In [12]: cars = [c.split(": ", 1) for c in cars.splitlines() if c]

In [13]: cars
Out[13]:
[['Car', 'Mazda'],
 ['Color', 'Green'],
 ['Comment', 'A very nice Car'],
 ['Car', 'Toyota'],
 ['Color', 'Black'],
 ['Comment', 'Okay car']]

In [14]: pd.DataFrame([dict(c) for c in split_at(cars, lambda x: x[0] == "Car")])
Out[14]:
      Car  Color          Comment
0   Mazda  Green  A very nice Car
1  Toyota  Black         Okay car

答案 1 :(得分:2)

##data

temp = StringIO("""  
Car: Mazda

Color: Green

Comment: A very nice Car

Car: Toyota

Color: Black

Comment: Okay car""")

df = pd.read_csv(temp, sep=':', engine='python', header=None)
df.columns = ['A','B']

##print(df)

         A                 B
0      Car             Mazda
1    Color             Green
2  Comment   A very nice Car
3      Car            Toyota
4    Color             Black
5  Comment          Okay car

使用pd.pivot并使用sorted(键为空)

pd.pivot(index=df.index, columns=df.A, values=df.B).apply(sorted,key=pd.isnull).dropna()

输出

A      Car   Color           Comment
0    Mazda   Green   A very nice Car
1   Toyota   Black          Okay car

答案 2 :(得分:1)

这应该有效:

import numpy as np
import pandas as pd
import io

temp = '''
Car: Mazda

Color: Green

Comment: A very nice Car

Car: Toyota

Color: Black

Comment: Okay car

'''
input_csv = io.StringIO(temp)
#input_csv = 'hello.csv'
df = pd.read_csv(input_csv, sep=":", skip_blank_lines=True,header=None)
data = np.array_split(df[1].to_numpy(), len(df)/3)
df2 = pd.DataFrame(data, columns=df[0].unique())
print(df2)

       Car   Color           Comment
0    Mazda   Green   A very nice Car
1   Toyota   Black          Okay car

使用纯python +熊猫

cars = []
colors = []
comments = []

lines = io.StringIO(temp).readlines()
for line in lines:
  if line.startswith('Car'):
    cars.append(line.split(':')[1].strip())
  if line.startswith('Color'):
    colors.append(line.split(':')[1].strip())
  if line.startswith('Comment'):
    comments.append(line.split(':')[1].strip())

df = pd.DataFrame({'car': cars, 'color': colors, 'comment': comments})
df