我有一个迭代pandas DataFrame的函数,并删除在特定列中具有连续重复的行。之后我尝试在列表中返回该列的运行总和,但我似乎得到一个关键错误。我不确定这意味着什么。
最小代码:
dropRows = [] #stores rows indices to drop
#Sanitize the data to get rid of consecutive duplicates
for indx, val in enumerate(df.removeConsecutives): #for all the values
if(indx == 0): #skip first indx
continue
if (val == df.removeConsecutives[indx-1]): #this is duplicate value as the last one
dropRows.append(indx)
sanitizedData = df.drop(dropRows)
#Create Timestamps based on RTC
listOfSums= [0] #first sum is zero
sum = 0 #running total of seconds for timestamps
for indx, val in enumerate(sanitizedData.removeConsecutives):
sum += sanitizedData.removeConsecutives[indx]
listOfSums.append(sum) #add running sum to list
错误跟踪指向此行
listOfSums.append(sum) #add running sum to list
这就是错误
C:\Users\JohnDoe\Anaconda\lib\site-packages\pandas\index.pyd in pandas.index.IndexEngine.get_value (pandas\index.c:2987)()
C:\Users\JohnDoe\Anaconda\lib\site-packages\pandas\index.pyd in pandas.index.IndexEngine.get_value (pandas\index.c:2802)()
C:\Users\JohnDoe\Anaconda\lib\site-packages\pandas\index.pyd in pandas.index.IndexEngine.get_loc (pandas\index.c:3528)()
C:\Users\JohnDoe\Anaconda\lib\site-packages\pandas\hashtable.pyd in pandas.hashtable.Int64HashTable.get_item (pandas\hashtable.c:7032)()
C:\Users\JohnDoe\Anaconda\lib\site-packages\pandas\hashtable.pyd in pandas.hashtable.Int64HashTable.get_item (pandas\hashtable.c:6973)()
KeyError: 150L
我正在使用iPython在一个安装文件中安装所有软件包(pandas,numpy,SciPy等),所以这就是为什么路径中的anaconda呢?
答案 0 :(得分:2)
下面:
for indx, val in enumerate(sanitizedData .band_rtc):
sum += sanitizedData.removeConsecutives[indx]
您正在使用枚举 - 即您的indx
变量将从0变为sanitizedData中的行数。但是,removeConsecutives
系列不是由连续数字编制索引。也许它曾经 - 但是在你使用drop
之后。
示例 - 你有一个300行的df。您在第150行找到了副本,并将其删除。现在你的df有299行,索引为0-149,151-299。但indx
从0到298 - 并试图访问150!如果你使用的话,这可能会有用:
for indx, val in enumerate(sanitizedData .band_rtc):
sum += sanitizedData.removeConsecutives.iloc[indx]
这是关于您的问题 - 但我建议您查看drop_duplicates和sum。