Question

我有一个excel文件，其中每列代表连续的时间段。我想确定每个时间段（即每列内）的局部最大值。这是我到目前为止的代码：

    import pandas as pd  
    df = pd.read_excel('my_sheet.xlsx', sheetname='Sheet1')   

    b = df['time_period_1'] 

    i = 1  
    for i in b:  
        if b[i] > b[i-1] and b[i] > b[i+1]:  
            print(b[i]) 
        i=i+1

哪个给出了错误

    KeyError: 24223

24223是列
中的第一个值知道发生了什么事吗？谢谢！

Answer 1

请注意，您在循环内部使用“i”作为b的元素。基本上“for b in b”将“b”的第一个值放在“i”中，即“24223”，但是这个键是不正确的，因为你的列肯定不包含24223个元素。

基本上，更改您的某个索引的名称。例如：

k = 1  
for i in b:  
    if i > b[k-1] and i > b[k+1]:  
        print(b[k]) 
    k += 1

（如果你想打印（b [k]）或打印（i），这里不确定，但你会明白这一点）

编辑1：您应该使用“i”作为“b”中的对象，因为您不需要它作为索引。当你使用“for i in b”时，我已经是单元格的内容了。仅当i是索引时才使用“b [i]”。所以在这里，您要么：

for i in range(len(b)) :
  if b[i] > ...

其中i是索引，或者：

for i in b :
   if i > ...

我是一个对象。

编辑2：以避免访问最后一个索引的b [k + 1]（由于k + 1不存在而触发错误），请使用if条件。同样为了清楚起见，我建议只使用一个索引。请注意，要实现这一点，您还需要第一行的条件，因为b [-1]将引用python中的最后一行，并且我认为它不是您想要比较的第一行。这是一个应该有效的代码：

for i in range(len(b)):           #range begins implicitly at 0
   if i > 0 and i < len(b)-1:     #so you are not on the first nor last row
      if b[i] > b[i-1] and b[i] > b[i+1]:
         print(b[i])
   elif i==0:        #first row, to be compared with the following one only
      if b[i] > b[i+1]:
         print(b[i])
   else:            #last row, to be compared with the previous one only
      if b[i] > b[i-1]:
         print(b[i])

这样做的模式简洁而优雅，但我认为这是最清晰的。

使用比较运算符在数据帧列中查找局部最大值

1 个答案: