我有以下示例数据集:
ThisWorkbook.Sheets("Sheet2").Range("A1:C1").Value=ThisWorkbook.Sheets("Sheet1").Range("A1:C1").Value
基本上:句子,它们的开始和结束时间以及每秒的字符。
现在,我还有一个列表:
import pandas as pd
data = {'Sentences':['Sentence1', 'Sentence2', 'Sentence3', 'Sentences4', 'Sentences5', 'Sentences6','Sentences7', 'Sentences8'],\
'Start_Time':[10,15,77,120,150,160,176,188],\
'End_Time': [12,17,88,128,158,168,182,190],\
'cps': [3,4,5,6,2,4,5,6]}
df = pd.DataFrame(data)
print(df)
基于该列表,我想重新组合句子。该列表列出了每个组的开始时间和结束时间,即
这是我到目前为止所做的:
time_list = [9,80,161,200]
如您所见,结果不是应有的结果。我觉得目前这有点混乱。
答案 0 :(得分:2)
使用:
mean_time=df[['Start_Time','End_Time']].mean(axis=1).rename('Interval Time')
labels = ["{0}-{1}".format(time_list[i], time_list[i+1]) for i in range(len(time_list)-1)]
new_df= ( df.groupby(pd.cut(mean_time,bins=time_list, labels=labels,include_lowest=True))
.Sentences
.agg(','.join)
.reset_index())
print(new_df)
Interval Time Sentences
0 9-90 Sentence1,Sentence2,Sentence3
1 90-161 Sentences4,Sentences5
2 161-200 Sentences6,Sentences7,Sentences8
使用time_list = [9,80,161,200]
:
Interval Time Sentences
0 9-80 Sentence1,Sentence2
1 80-161 Sentence3,Sentences4,Sentences5
2 161-200 Sentences6,Sentences7,Sentences8
如果您愿意创建列表:
new_df= ( df.groupby(pd.cut(mean_time,time_list,right=False, labels=labels,include_lowest=True))
.Sentences
.agg(list)
.reset_index())
print(new_df)
输出:
Interval Time Sentences
0 9-80 [Sentence1, Sentence2]
1 80-161 [Sentence3, Sentences4, Sentences5]
2 161-200 [Sentences6, Sentences7, Sentences8]
答案 1 :(得分:1)
time_list = [9,90,161,200]
li={}
li1 = []
counter = 0
for i,j in zip(time_list, time_list[1:]):
li[counter]=range(i,j)
li1.append([counter,i,j])
counter+=1
df1 = pd.DataFrame(li1, columns=['Group','Start', 'End'])
df1
Group Start End
0 0 9 90
1 1 90 161
2 2 161 200
从时间表中创建数据框,并创建一个字典,将值范围映射到组号
data = {'Sentences':['Sentence1', 'Sentence2', 'Sentence3', 'Sentences4', 'Sentences5', 'Sentences6','Sentences7', 'Sentences8'],\
'Start_Time':[10,15,77,120,150,160,176,188],\
'End_Time': [12,17,88,128,158,168,182,190],\
'cps': [3,4,5,6,2,4,5,6]}
df = pd.DataFrame(data)
def f(row):
val = range(row['Start_Time'],row['End_Time'])
len_list=[]
for k,v in li.items():
len_list.append(len([i for i in val if i in v]))
if max(len_list)==0:
return None
return len_list.index(max(len_list)) # returns first max of the groups when same length
df['Group'] = df.apply(lambda i:f(i), axis=1)
df.merge(df1, on='Group').groupby(['Start', 'End'], as_index=False)['Sentences'].sum()
Start End Sentences
0 9 90 Sentence1Sentence2Sentence3
1 90 161 Sentences4Sentences5
2 161 200 Sentences6Sentences7Sentences8