复杂的csv格式:两行中的列

时间:2014-08-18 10:43:32

标签: python pandas

以下是(我相信的)我正在潜入的文件非常笨拙的标题:

,,,1980,1981,1982,1983,1984,1985,1986,1987,1988,1989,1990,1991,1992,1993,1994,1995,1996,1997,1998,1999,2000,2001,2002,2003,2004,2005,2006,2007,2008,2009,2010,2011,2012
"Office","Office(code)","Origin"
"Albania","AL","Total",,,,,,,,,,,,,,,,,,,,,6,49,87,201,390,395,116,420,541,402,349,21,,

也就是说,前两行一起构成标题。有没有办法在没有任何重大麻烦的情况下将read_csv()应用于此?

1 个答案:

答案 0 :(得分:3)

您可以手动解析前两行,然后将其余行传递给read_csv,例如:

with open('data.csv') as f:
    headers = f.readline().strip().split(',') # get years
    headers[:3] = f.readline().strip().split(',') # update first three columns
    data = pd.read_csv(f, names=headers)

请注意,这会将文件句柄f传递给read_csv,并使用"读取头"已经在第三行的开头。