Question

我想对熊猫系列中的物品进行比较。如果该项不等于下一个值，则追加到列表，否则继续。我认为我的代码可以正常工作，但由于上次出现索引失败，因为items [k + 1]的索引超出范围。我如何才能停止在最后一行进行比较？

之所以这样做，是因为我有一个按日期排序的数据集，但没有时间戳字段。我只知道开始的年月，结束的年月。

但是，其中一个包含字段（euribor3m字段）应该是每日费率，因此我希望我可以将每一天分开，然后仅通过将行与下一行进行比较并注意是否该值来创建时间戳记因为领域已经改变了。如果有的话，这意味着新行将映射到新的一天，并且由于这些行是按天排序的，因此我希望应该获得总计x天数，与开始月年和结束月年之间的天数匹配

html = '''<div class="someclass" itemprop="text">
            <p>some text</p>
            <span>not this text</span>   
          </div>
          <div class="someclass" itemprop="text">
            <div>not this text</div>   
          </div>
'''

soup = BeautifulSoup(html, 'html.parser')
p = soup.select_one('div.someclass p') # or select()
print(p.text)
# some text

Answer 1

一种解决方案是捕获KeyError：

for k, i in items.items():
    try:
        if items[k+1] != items[k]:
            unique.append(items[k+1])
    except KeyError:
        pass

但是，您不应该一开始就以这种方式进行迭代，因为Pandas专门从事向量化操作。您可以改用shift：

df = pd.DataFrame({'euribor3m': [1, 1, 2, 3, 4, 5, 5, 6]})

res = df.loc[df['euribor3m'].shift(-1) != df['euribor3m']]

print(res)

#    euribor3m
# 1          1
# 2          2
# 3          3
# 4          4
# 6          5
# 7          6

Answer 2

您可以为此使用shift()。

df = pd.DataFrame({'euribor3m':[5,5,7,7,8,9,11,11,34,45,45]})

df0 = df.shift()

mask = df['euribor3m']==df0['euribor3m']
df_new = df[mask]
print(list(df_new['euribor3m']))

输出：

[5, 7, 11, 45]

比较词典中的下一项和上一项

2 个答案: