下面是一个带有一些虚拟标题的csv代码段,而实际帧由beerId
锚定:
This work is an unpublished, copyrighted work and contains confidential information.
beer consumption
consumptiondate 7/24/2018
consumptionlab H1
numbeerssuccessful 40
numbeersfailed 0
totalnumbeers 40
consumptioncomplete TRUE
beerId Book
341027 Northern Light
此df = pd.read_csv(path_csv, header=8)
代码有效,但问题在于,根据一天的时间,标头并不总是以8为单位。无法像
lambda
skiprows :类似于列表或整数或可调用,默认为无
要跳过的行号(索引为0)或要跳过的行数(整数) 文件的开头。
如果可调用,则将针对该行评估可调用函数 索引,如果应跳过该行,则返回True;否则返回False 除此以外。一个有效的可调用参数的示例是lambda x: x在[0,2]中。
查找beerId
的索引行
答案 0 :(得分:2)
我认为首先需要预处理
path_csv = 'file.csv'
with open(path_csv) as f:
lines = f.readlines()
#get list of all possible lins starting by beerId
num = [i for i, l in enumerate(lines) if l.startswith("beerId" )]
#if not found value return 0 else get first value of list subtracted by 1
num = 0 if len(num) == 0 else num[0] - 1
print (num)
8
df = pd.read_csv(path_csv, header=num)
print (df)
beerId Book
0 341027 Northern Light