我想将此csv
文件读入pandas.DataFrame
。
Id,Name,Shape Library,Page Name,Line Connection Start,Line Connection End,Text Area 1,Text Area 2,Text Area 3,Text Area 4
1,Page,,0:Page 1,,,,,,
2,Table,Tables,0:Page 1,,,Openingsuren gemeentehuis,Action,"Is het gemeentehuis open?
Wat zijn de openingsuren van het gemeentehuis
Wanneer is het gemeentehuis open","webhook
De webserver staat niet op denk ik, gelieve ... te contacteren"
3,easy,Tables,0:Page 1,,,Openignsuren andere dag,Action,"En morgen?",
4,easy,Tables,0:Page 1,,,Openingsuren,,,
但有些行可以多行显示(参见Id 2)
有没有办法,把它正确地读成熊猫df?
答案 0 :(得分:1)
您可以使用csv
模块编写自己的解析器,然后为pandas
构建一个生成器,如:
import csv
import pandas as pd
def read_my_csv(file_handle):
# build csv reader
reader = csv.reader(file_handle)
# get and yield the header
header = next(reader)
yield header
# for each row, get enough data and then yield the row
for row in reader:
while len(row) < len(header):
row += next(reader)
yield row
with open('file1', 'rU') as f:
generator = read_my_csv(f)
columns = next(generator)
df = pd.DataFrame(generator, columns=columns)
print(df)
Id Name Shape Library Page Name Line Connection Start Line Connection End \
0 1 Page 0:Page 1
1 2 Table Tables 0:Page 1
2 3 easy Tables 0:Page 1
3 4 easy Tables 0:Page 1
Text Area 1 Text Area 2 \
0
1 Openingsuren gemeentehuis Action
2 Openignsuren andere dag Action
3 Openingsuren
Text Area 3 \
0
1 Is het gemeentehuis open?\nWat zijn de opening...
2 En morgen?
3
Text Area 4
0
1 webhook\nDe webserver staat niet op denk ik, g...
2
3