我正在使用csv库从我拥有的几个文件中读取特定行。我遇到的问题是将这些行保存到数据帧中。我遇到无法解决的索引错误。
当前版本的代码查找列名(在第三行),然后开始查找我需要的数据(从第六行开始,一直持续到命中空白行)。查找列名效果很好,但是当我尝试将数据追加到它时,出现错误: “ InvalidIndexError:仅对具有唯一值的Index对象重新索引有效”
我当前拥有的代码如下:
i=0
import csv
import pandas as pd
df = pd.DataFrame()
with open('C:/Users/sword/Anaconda3/envs/exceltest/RF_SubjP02_Free_STATIC_TR01.csv', 'r') as csvfile:
csvreader = csv.reader(csvfile, delimiter=',')
for row in csvreader:
if csvreader.line_num == 3: #this is for the column names
print(row)
df = pd.DataFrame(columns = row)
df.columns = row
if csvreader.line_num >= 6: #this is for the data
if row: #checks for blank row
if i<10: #just printing the top ten rows for debugging purposes, theres thousands I need
print(i)
i+=1
df.append(row) #this is where I get the indexing error
else: # breaks out of loop if
break
print(df) #for double checking if it worked
编辑: 数据样本在这里:
Devices
1680
Column Name 1,Column Name 2,Column Name 3,Column Name 4,Column Name 5,Column Name 6,Column Name 7,Column Name 8,Column Name 9,Column Name 10,Column Name 11,Column Name 12,Column Name 13,Column Name 14,Column Name 15,Column Name 16,Column Name 17,Column Name 18,Column Name 19,Column Name 20,Column Name 21
Frame,Sub Frame,Sync,v,v,v,v,v,v,v,v,v,v,v,v,v,v,v,v,FS,FS
,,,V,V,V,V,V,V,V,V,V,V,V,V,V,V,V,V,V,V
1,0,0,1.28178e-005,-5.21866e-005,8.24e-006,1.19022e-005,1.00711e-005,3.02133e-005,2.83822e-005,0,6.40889e-006,-6.1037e-007,2.83822e-005,-6.40889e-006,2.65511e-005,1.46489e-005,1.73956e-005,1.09867e-005,0,0
1,1,0,9.82043e-006,-4.40121e-005,8.78497e-006,1.02673e-005,1.1706e-005,3.15758e-005,2.62023e-005,5.44972e-006,8.0438e-006,-1.06924e-005,2.91997e-005,-8.0438e-006,2.73686e-005,1.51939e-005,1.73956e-005,1.04417e-005,0,0
1,2,0,1.40167e-005,-3.27202e-005,1.00493e-005,1.22292e-005,1.33409e-005,3.55758e-005,2.57009e-005,6.58328e-006,9.67872e-006,-1.5499e-005,2.95376e-005,-8.47978e-006,2.98645e-005,1.47797e-005,1.42783e-005,9.89672e-006,0,0
1,3,0,1.83656e-005,-2.59735e-005,1.01692e-005,1.46816e-005,1.45617e-005,3.74506e-005,2.56355e-005,3.19357e-006,4.47972e-006,-1.95863e-005,2.93959e-005,-7.92392e-006,3.13469e-005,1.46489e-005,1.38423e-005,9.14466e-006,0,0
1,4,0,1.84419e-005,-2.20169e-005,8.5016e-006,1.52157e-005,1.46053e-005,3.87149e-005,2.44148e-005,6.53978e-007,-4.27252e-006,-1.96627e-005,2.87746e-005,-8.1528e-006,3.05185e-005,1.39513e-005,1.59568e-005,9.37354e-006,0,0
1,5,0,1.5837e-005,-1.80387e-005,7.46613e-006,1.39622e-005,1.40603e-005,4.07858e-005,2.10905e-005,0,-8.4253e-006,-1.45073e-005,2.88073e-005,-9.25364e-006,2.83277e-005,1.21529e-005,1.69705e-005,9.48254e-006,0,0
1,6,0,1.39295e-005,-1.44963e-005,7.52064e-006,1.24908e-005,1.42783e-005,4.23117e-005,1.63493e-005,0,-4.77405e-006,-9.22096e-006,2.98427e-005,-1.00711e-005,2.60933e-005,1.02455e-005,1.5935e-005,7.84765e-006,0,0
我希望输出为第3行作为列名,第6行直到空白行作为填充各列的数据的数据框。
例如:
In[1]: csv file above
Out[1]: [column Name 1] [Column Name 2] ...
[Data 1 in Row 6] [Data 2 in Row 6] ...
[Data 1 in Row 7] [Data 2 in Row 7] ...
[Data 1 in Row 8] [Data 2 in Row 8] ...
答案 0 :(得分:0)
我很高兴被无罪投票,但没有给出为什么我的问题值得投票的理由。我自己就能弄清楚。希望以后可以回答其他人的问题。
import csv
import pandas as pd
temp = [] #initialize array
with open('C:/Users/sword/Anaconda3/envs/exceltest/RF_SubjP02_Free_STATIC_TR01.csv', 'r') as csvfile:
csvreader = csv.reader(csvfile, delimiter=',')
for row in csvreader:
if csvreader.line_num == 3:
temp.append(row) #gets column names and saves to array
if csvreader.line_num >= 6:
if row:
temp.append(row) # gets data values and saves to array
else: #stops at blank row
break
df = pd.DataFrame(temp) #creates a dataframe from an array
df.columns = df.iloc[0] #make top row the column names
df.reindex(df.index.drop(1))
print(df)