我正在使用Google Cloud Video Intelligence API,并且正在尝试将结果放入pandas数据框中。 API的输出类是repeatingcompositecontainer。因此,我的想法是在API函数中使用的for循环内构建一个数据框。
这是API函数处理结果的方式:
acceptableMethods.indexOf(data.method) !== -1
借助stackoverflow文章How to build and fill pandas dataframe from for loop?
我创建了一个空列表,并将结果附加到以后,如下所示转换为熊猫数据框
segment_labels = result.annotation_results[0].segment_label_annotations
for i, segment_label in enumerate(segment_labels):
print('Video label description: {}'.format(
segment_label.entity.description))
for category_entity in segment_label.category_entities:
print('\tLabel category description: {}'.format(
category_entity.description))
for i, segment in enumerate(segment_label.segments):
start_time = (segment.segment.start_time_offset.seconds +
segment.segment.start_time_offset.nanos / 1e9)
end_time = (segment.segment.end_time_offset.seconds +
segment.segment.end_time_offset.nanos / 1e9)
positions = '{}s to {}s'.format(start_time, end_time)
confidence = segment.confidence
print('\tSegment {}: {}'.format(i, positions))
print('\tConfidence: {}'.format(confidence))
print('\n')
当我只尝试最后一个for循环时,它为我提供了一个很好的结构化数据框架,如下所示:
df = []
# Process video/segment level label annotations
segment_labels = result.annotation_results[0].segment_label_annotations
for i, segment_label in enumerate(segment_labels):
print('Video label description: {}'.format(
segment_label.entity.description))
for category_entity in segment_label.category_entities:
print('\tLabel category description: {}'.format(
category_entity.description))
df.append({'Description': category_entity.description})
for i, segment in enumerate(segment_label.segments):
start_time = (segment.segment.start_time_offset.seconds +
segment.segment.start_time_offset.nanos / 1e9)
end_time = (segment.segment.end_time_offset.seconds +
segment.segment.end_time_offset.nanos / 1e9)
positions = '{}s to {}s'.format(start_time, end_time)
confidence = segment.confidence
df.append({'Confidence': segment.confidence, 'Start': start_time, 'End': end_time})
print('\tSegment {}: {}'.format(i, positions))
print('\tConfidence: {}'.format(confidence))
print('\n')
但是,当我将相同的内容添加到for循环的逻辑中时,它会给出如下所示的失真数据框
>>> frame = pd.DataFrame(df)
>>> frame
Confidence End Start
0.704168 599.682416 0.0
0.737053 599.682416 0.0
0.832496 599.682416 0.0
0.427637 599.682416 0.0
0.518693 599.682416 0.0
我希望是否有一种方法可以修复它并获得如下所示的数据帧:
>>> frame = pd.DataFrame(df)
>>> frame
Confidence Description End Start
NaN technology NaN NaN
0.741133 NaN 599.682416 0.0
NaN keyboard NaN NaN
0.328138 NaN 599.682416 0.0
NaN person NaN NaN
0.436333 NaN 599.682416 0.0
NaN person NaN NaN
我真的希望有人可以帮我得到这个。
谢谢!
答案 0 :(得分:2)
按如下所示更改代码:
df = []
# Process video/segment level label annotations
segment_labels = result.annotation_results[0].segment_label_annotations
for i, segment_label in enumerate(segment_labels):
print('Video label description: {}'.format(
segment_label.entity.description))
label_row = {} # Create a dictionary for the label
for category_entity in segment_label.category_entities:
print('\tLabel category description: {}'.format(
category_entity.description))
# Add the description
label_row['Description'] = category_entity.description
for i, segment in enumerate(segment_label.segments):
start_time = (segment.segment.start_time_offset.seconds +
segment.segment.start_time_offset.nanos / 1e9)
end_time = (segment.segment.end_time_offset.seconds +
segment.segment.end_time_offset.nanos / 1e9)
positions = '{}s to {}s'.format(start_time, end_time)
confidence = segment.confidence
row_segment_info = {'Confidence': segment.confidence, 'Start': start_time, 'End': end_time})
# Add the segment info for this row
label_row.update(row_segment_info)
df.append(label_row) # Now add the row
print('\tSegment {}: {}'.format(i, positions))
print('\tConfidence: {}'.format(confidence))
print('\n')
摘要:您正在每个子循环中添加行列表。您只想添加一次行。