如何使用pandas每次从csv文件中读取10条记录?

时间:2016-07-25 06:19:52

标签: python csv pandas dictionary dataframe

我想读一个有1000行的csv文件,所以我决定以块的形式读取这个文件。但是我在阅读这个csv文件时遇到了问题。

我想在第一次迭代时读取前10条记录,并在第二次迭代时将其特定列转换为python字典,先跳过前10条记录,然后读取下一条10条记录。

Input.csv -

time,line_id,high,low,avg,total,split_counts
1468332421098000,206,50879,50879,50879,2,"[50000,2]"
1468332421195000,206,39556,39556,39556,2,"[30000,2]"
1468332421383000,206,61636,61636,61636,2,"[60000,2]"
1468332423568000,206,47315,38931,43123,4,"[30000,2][40000,2]"
1468332423489000,206,38514,38445,38475,6,"[30000,6]"
1468332421672000,206,60079,60079,60079,2,"[60000,2]"
1468332421818000,206,44664,44664,44664,2,"[40000,2]"
1468332422164000,206,48500,48500,48500,2,"[40000,2]"
1468332423490000,206,39469,37894,38206,12,"[30000,12]"
1468332422538000,206,44023,44023,44023,2,"[40000,2]"
1468332423491000,206,38813,38813,38813,2,"[30000,2]"
1468332423528000,206,75970,75970,75970,2,"[70000,2]"
1468332423533000,206,42546,42470,42508,4,"[40000,4]"
1468332423536000,206,41065,40888,40976,4,"[40000,4]"
1468332423566000,206,66401,62453,64549,6,"[60000,6]"

程序代码 -

if __name__ == '__main__':
   s = 0
   while(True):
      n = 10
      df = pandas.read_csv('Input.csv', skiprows=s, nrows=n)
      d = dict(zip(df.time, df.split_counts))
      print d
      s += n

我正面临这个问题 -

AttributeError: 'DataFrame' object has no attribute 'time'

我知道在第二次迭代中它无法识别时间和split_counts属性但是有没有办法做我想要的?

2 个答案:

答案 0 :(得分:1)

第一次迭代应该可以正常工作,但任何进一步的迭代都是有问题的。

read_csv有一个headers kwarg,默认值为infer(基本上是0)。这意味着解析后的csv中的第一行将用作数据框中列的名称。

read_csv还有另一个kwarg,names

正如documentation

中所述
  

header:int或int列表,默认'推断'   用作列名的行号和数据的开头。如果没有传递名称,则默认行为就像设置为0,否则为None。显式传递header = 0以便能够替换现有名称。标题可以是整数列表,其指定列上的多索引的行位置,例如, [0,1,3]。将跳过未指定的干预行(例如,跳过此示例中的2)。请注意,如果skip_blank_lines = True,此参数将忽略注释行和空行,因此header = 0表示第一行数据而不是文件的第一行。

     

names:array-like,默认为None   要使用的列名列表。如果文件不包含标题行,则应显式传递header = None   

您应该将headers=Nonenames=['time', 'line_id', 'high', 'low', 'avg', 'total', 'split_counts']传递给read_csv

答案 1 :(得分:1)

您可以在read_csv中使用<iframe name="right_side" src="" width="50%" height="50%" ></iframe>

chunksize
import pandas as pd
import io

temp=u'''time,line_id,high,low,avg,total,split_counts
1468332421098000,206,50879,50879,50879,2,"[50000,2]"
1468332421195000,206,39556,39556,39556,2,"[30000,2]"
1468332421383000,206,61636,61636,61636,2,"[60000,2]"
1468332423568000,206,47315,38931,43123,4,"[30000,2][40000,2]"
1468332423489000,206,38514,38445,38475,6,"[30000,6]"
1468332421672000,206,60079,60079,60079,2,"[60000,2]"
1468332421818000,206,44664,44664,44664,2,"[40000,2]"
1468332422164000,206,48500,48500,48500,2,"[40000,2]"
1468332423490000,206,39469,37894,38206,12,"[30000,12]"
1468332422538000,206,44023,44023,44023,2,"[40000,2]"
1468332423491000,206,38813,38813,38813,2,"[30000,2]"
1468332423528000,206,75970,75970,75970,2,"[70000,2]"
1468332423533000,206,42546,42470,42508,4,"[40000,4]"
1468332423536000,206,41065,40888,40976,4,"[40000,4]"
1468332423566000,206,66401,62453,64549,6,"[60000,6]"'''
#after testing replace io.StringIO(temp) to filename

#for testing 2
reader = pd.read_csv(io.StringIO(temp), chunksize=2)
print (reader)
<pandas.io.parsers.TextFileReader object at 0x000000000AD1CD68>

请参阅pandas documentation