将特定的CSV行写入数据帧

时间:2018-11-13 06:36:20

标签: python pandas csv dataframe

我正在使用csv库从我拥有的几个文件中读取特定行。我遇到的问题是将这些行保存到数据帧中。我遇到无法解决的索引错误。

当前版本的代码查找列名(在第三行),然后开始查找我需要的数据(从第六行开始,一直持续到命中空白行)。查找列名效果很好,但是当我尝试将数据追加到它时,出现错误: “ InvalidIndexError:仅对具有唯一值的Index对象重新索引有效”

我当前拥有的代码如下:

    i=0
    import csv
    import pandas as pd
    df = pd.DataFrame()
    with open('C:/Users/sword/Anaconda3/envs/exceltest/RF_SubjP02_Free_STATIC_TR01.csv', 'r') as csvfile:
        csvreader = csv.reader(csvfile, delimiter=',')
        for row in csvreader:
           if csvreader.line_num == 3:  #this is for the column names
               print(row)
               df = pd.DataFrame(columns = row)
               df.columns = row
           if csvreader.line_num >= 6:  #this is for the data
               if row: #checks for blank row
                   if i<10: #just printing the top ten rows for debugging purposes, theres thousands I need
                       print(i)
                       i+=1
                       df.append(row)  #this is where I get the indexing error
               else: # breaks out of loop if
                   break
    print(df) #for double checking if it worked

编辑: 数据样本在这里:

Devices

1680

Column Name 1,Column Name 2,Column Name 3,Column Name 4,Column Name 5,Column Name 6,Column Name 7,Column Name 8,Column Name 9,Column Name 10,Column Name 11,Column Name 12,Column Name 13,Column Name 14,Column Name 15,Column Name 16,Column Name 17,Column Name 18,Column Name 19,Column Name 20,Column Name 21

Frame,Sub Frame,Sync,v,v,v,v,v,v,v,v,v,v,v,v,v,v,v,v,FS,FS

,,,V,V,V,V,V,V,V,V,V,V,V,V,V,V,V,V,V,V
1,0,0,1.28178e-005,-5.21866e-005,8.24e-006,1.19022e-005,1.00711e-005,3.02133e-005,2.83822e-005,0,6.40889e-006,-6.1037e-007,2.83822e-005,-6.40889e-006,2.65511e-005,1.46489e-005,1.73956e-005,1.09867e-005,0,0

1,1,0,9.82043e-006,-4.40121e-005,8.78497e-006,1.02673e-005,1.1706e-005,3.15758e-005,2.62023e-005,5.44972e-006,8.0438e-006,-1.06924e-005,2.91997e-005,-8.0438e-006,2.73686e-005,1.51939e-005,1.73956e-005,1.04417e-005,0,0

1,2,0,1.40167e-005,-3.27202e-005,1.00493e-005,1.22292e-005,1.33409e-005,3.55758e-005,2.57009e-005,6.58328e-006,9.67872e-006,-1.5499e-005,2.95376e-005,-8.47978e-006,2.98645e-005,1.47797e-005,1.42783e-005,9.89672e-006,0,0

1,3,0,1.83656e-005,-2.59735e-005,1.01692e-005,1.46816e-005,1.45617e-005,3.74506e-005,2.56355e-005,3.19357e-006,4.47972e-006,-1.95863e-005,2.93959e-005,-7.92392e-006,3.13469e-005,1.46489e-005,1.38423e-005,9.14466e-006,0,0

1,4,0,1.84419e-005,-2.20169e-005,8.5016e-006,1.52157e-005,1.46053e-005,3.87149e-005,2.44148e-005,6.53978e-007,-4.27252e-006,-1.96627e-005,2.87746e-005,-8.1528e-006,3.05185e-005,1.39513e-005,1.59568e-005,9.37354e-006,0,0

1,5,0,1.5837e-005,-1.80387e-005,7.46613e-006,1.39622e-005,1.40603e-005,4.07858e-005,2.10905e-005,0,-8.4253e-006,-1.45073e-005,2.88073e-005,-9.25364e-006,2.83277e-005,1.21529e-005,1.69705e-005,9.48254e-006,0,0

1,6,0,1.39295e-005,-1.44963e-005,7.52064e-006,1.24908e-005,1.42783e-005,4.23117e-005,1.63493e-005,0,-4.77405e-006,-9.22096e-006,2.98427e-005,-1.00711e-005,2.60933e-005,1.02455e-005,1.5935e-005,7.84765e-006,0,0

我希望输出为第3行作为列名,第6行直到空白行作为填充各列的数据的数据框。

例如:

    In[1]: csv file above
    Out[1]: [column Name 1]   [Column Name 2] ...
            [Data 1 in Row 6] [Data 2 in Row 6] ...
            [Data 1 in Row 7] [Data 2 in Row 7] ...
            [Data 1 in Row 8] [Data 2 in Row 8] ...

1 个答案:

答案 0 :(得分:0)

我很高兴被无罪投票,但没有给出为什么我的问题值得投票的理由。我自己就能弄清楚。希望以后可以回答其他人的问题。

    import csv
    import pandas as pd
    temp = []  #initialize array
    with open('C:/Users/sword/Anaconda3/envs/exceltest/RF_SubjP02_Free_STATIC_TR01.csv', 'r') as csvfile:
         csvreader = csv.reader(csvfile, delimiter=',')
         for row in csvreader:
             if csvreader.line_num == 3:  
                temp.append(row)     #gets column names and saves to array  
             if csvreader.line_num >= 6:
                if row: 
                     temp.append(row)  # gets data values and saves to array
                else: #stops at blank row
                     break
    df = pd.DataFrame(temp) #creates a dataframe from an array
    df.columns = df.iloc[0]  #make top row the column names
    df.reindex(df.index.drop(1))
    print(df)