Question

我正在使用Pandas来构建和处理数据。

我这里有一个DataFrame，日期为索引，Id和比特率。我希望按ID分组我的数据并同时重新采样，相对于每个Id的时间，最后保持比特率。

例如，给定：

port_homepage_url = ports.xpath('td[7]/a/@href').extract_first()

给出：

这是我的代码，每次ID和比特率时都会获得一个唯一的日期列：

df = pd.DataFrame(
{'Id' : ['CODI126640013.ts', 'CODI126622312.ts'],
'beginning_time':['2016-07-08 02:17:42', '2016-07-08 02:05:35'], 
'end_time' :['2016-07-08 02:17:55', '2016-07-08 02:26:11'],
'bitrate': ['3750000', '3750000'],
'type' : ['vod', 'catchup'],
'unique_id' : ['f2514f6b-ce7e-4e1a-8f6a-3ac5d524be30', 'f2514f6b-ce7e-4e1a-8f6a-3ac5d524bb22']})

给出：

现在，重新抽样的时间！这是我的代码：

df = df.drop(['type', 'unique_id'], axis=1)
df.beginning_time = pd.to_datetime(df.beginning_time)
df.end_time = pd.to_datetime(df.end_time)
df = pd.melt(df, id_vars=['Id','bitrate'], value_name='dates').drop('variable', axis=1)
df.set_index('dates', inplace=True)

这就是结果：

这正是我想要做的！我有38279个具有相同列的日志，当我做同样的事情时，我有一条错误消息。第一部分完美运作，并给出了：

部分（df.groupby（＆＃39; Id＆＃39;）。resample（＆＃39; 1S＆＃39;）。ffill（））会出现此错误消息：

print (df.groupby('Id').resample('1S').ffill())

有什么想法吗？ Thnx！

Answer 1

beginning_time和end_time列中的重复似乎存在问题，我尝试模拟它：

df = pd.DataFrame(
{'Id' : ['CODI126640013.ts', 'CODI126622312.ts', 'a'],
'beginning_time':['2016-07-08 02:17:42', '2016-07-08 02:17:42', '2016-07-08 02:17:45'], 
'end_time' :['2016-07-08 02:17:42', '2016-07-08 02:17:42', '2016-07-08 02:17:42'],
'bitrate': ['3750000', '3750000', '444'],
'type' : ['vod', 'catchup', 's'],
'unique_id':['f2514f6b-ce7e-4e1a-8f6a-3ac5d524be30', 'f2514f6b-ce7e-4e1a-8f6a-3ac5d524bb22','w']})

print (df)  
                 Id       beginning_time  bitrate             end_time  \
0  CODI126640013.ts  2016-07-08 02:17:42  3750000  2016-07-08 02:17:42   
1  CODI126622312.ts  2016-07-08 02:17:42  3750000  2016-07-08 02:17:42   
2                 a  2016-07-08 02:17:45      444  2016-07-08 02:17:42   

      type                             unique_id  
0      vod  f2514f6b-ce7e-4e1a-8f6a-3ac5d524be30  
1  catchup  f2514f6b-ce7e-4e1a-8f6a-3ac5d524bb22  
2        s                                     w

df = df.drop(['type', 'unique_id'], axis=1)
df.beginning_time = pd.to_datetime(df.beginning_time)
df.end_time = pd.to_datetime(df.end_time)
df = pd.melt(df, id_vars=['Id','bitrate'], value_name='dates').drop('variable', axis=1)
df.set_index('dates', inplace=True)


print (df)  
                                   Id  bitrate
dates                                         
2016-07-08 02:17:42  CODI126640013.ts  3750000
2016-07-08 02:17:42  CODI126622312.ts  3750000
2016-07-08 02:17:45                 a      444
2016-07-08 02:17:42  CODI126640013.ts  3750000
2016-07-08 02:17:42  CODI126622312.ts  3750000
2016-07-08 02:17:42                 a      444

print (df.groupby('Id').resample('1S').ffill())

ValueError：无法使用方法或限制
重新索引非唯一索引

一种可能的解决方案是添加drop_duplicates并使用resample groupby旧df = df.drop(['type', 'unique_id'], axis=1) df.beginning_time = pd.to_datetime(df.beginning_time) df.end_time = pd.to_datetime(df.end_time) df = pd.melt(df, id_vars=['Id','bitrate'], value_name='dates').drop('variable', axis=1) print (df.groupby('Id').apply(lambda x : x.drop_duplicates('dates') .set_index('dates') .resample('1S') .ffill())) Id bitrate Id dates CODI126622312.ts 2016-07-08 02:17:42 CODI126622312.ts 3750000 CODI126640013.ts 2016-07-08 02:17:42 CODI126640013.ts 3750000 a 2016-07-08 02:17:41 a 444 2016-07-08 02:17:42 a 444 2016-07-08 02:17:43 a 444 2016-07-08 02:17:44 a 444 2016-07-08 02:17:45 a 444：

print (df[df.beginning_time == df.end_time])
2        s                                     w  
                 Id       beginning_time  bitrate             end_time  \
0  CODI126640013.ts  2016-07-08 02:17:42  3750000  2016-07-08 02:17:42   
1  CODI126622312.ts  2016-07-08 02:17:42  3750000  2016-07-08 02:17:42   

      type                             unique_id  
0      vod  f2514f6b-ce7e-4e1a-8f6a-3ac5d524be30  
1  catchup  f2514f6b-ce7e-4e1a-8f6a-3ac5d524bb22

您还可以按way检查重复项：

Public Function LoadText(ByVal DirIn As String) As String
    Dim FileBuffer As String
    Dim LenghtFile As Long
    Dim ContIN As Long

    ContIN = FreeFile
    Open DirIn For Binary Access Read As #ContIN
    LenghtFile = LOF(ContIN)
    FileBuffer = Space(LenghtFile)
    Get #ContIN, , FileBuffer
    Close #ContIN
    LoadText = FileBuffer
    'following line for test purpose
    debug.print(asc(mid(filebuffer,1,1)))
    debug.print(asc(mid(filebuffer,2,1)))
    debug.print(asc(mid(filebuffer,3,1)))
End Function

SUB Main
    dim testSTring as String
    teststring=loadtext("e:\testme.bin")
end sub

重新取样错误：无法使用方法或限制重新索引非唯一索引

1 个答案: