Question

我的Dataframe df3看起来像这样：

    Id           Timestamp         Data    Group_Id    
0    1     2018-01-01 00:00:05.523 125.5   101 
1    2     2018-01-01 00:00:05.757 125.0   101 
2    3     2018-01-02 00:00:09.507 127.0   52  
3    4     2018-01-02 00:00:13.743 126.5   52  
4    5     2018-01-03 00:00:15.407 125.5   50
                    ...

11   11    2018-01-01 00:00:07.523 125.5   120 
12   12    2018-01-01 00:00:08.757 125.0   120 
13   13    2018-01-04 00:00:14.507 127.0   300  
14   14    2018-01-04 00:00:15.743 126.5   300  
15   15    2018-01-05 00:00:19.407 125.5   350

我想每秒使用填充来重新采样，这样看起来像这样：

    Id           Timestamp         Data    Group_Id    
0    1     2018-01-01 00:00:06.000 125.00    101 
1    2     2018-01-01 00:00:07.000 125.00    101 
2    3     2018-01-01 00:00:08.000 125.00    101 
3    4     2018-01-02 00:00:09.000 125.00     52 
4    5     2018-01-02 00:00:10.000 127.00     52 

                    ...

我的代码：

def resample(df):
    indexing = df[['Timestamp','Data']]
    indexing['Timestamp']=pd.to_datetime(indexing['Timestamp'])
    indexing =indexing.set_index('Timestamp')
    indexing1= indexing.resample('1S',fill_method='ffill')
    # indexing1 = indexing1.resample('D')
    return indexing1
indexing = resample(df3)

但发生错误

ValueError: cannot reindex a non-unique index with a method or limit

我不太了解这个错误的含义。 this similar question的@jezrael建议将drop_duplicates与groupby一起使用。我不确定这对数据有什么影响，因为我的数据似乎没有重复项？有人可以解释一下吗？谢谢。

Answer 1

此错误是由于以下原因引起的：

    Id           Timestamp         Data    Group_Id    
0    1     2018-01-01 00:00:05.523 125.5   101 
1    2     2018-01-01 00:00:05.757 125.0   101

当您将这两个时间戳重新采样到最近的秒时，它们都将变为 2018-01-01 00:00:06和熊猫不知道该选择哪个数据值因为它有两个可供选择。相反，您可以做的是使用聚合函数例如last（尽管mean，max，min也可能适用）以便选择其中一个值。然后，您可以应用前向填充。

示例：

from io import StringIO
import pandas as pd
df = pd.read_table(StringIO("""    Id           Timestamp         Data    Group_Id    
0    1     2018-01-01 00:00:05.523  125.5   101 
1    2     2018-01-01 00:00:05.757  125.0   101 
2    3     2018-01-02 00:00:09.507  127.0   52  
3    4     2018-01-02 00:00:13.743  126.5   52  
4    5     2018-01-03 00:00:15.407  125.5   50"""), sep='\s\s+')
df['Timestamp'] = pd.to_datetime(df['Timestamp']).dt.round('s')
df.set_index('Timestamp', inplace=True)
df = df.resample('1S').last().ffill()

如何使用正向填充python重新采样

1 个答案: