我正在尝试开发一个lambda,它将对AWS Transcribe .JSON输出文件进行排序。此lambda会删除所有不必要的数据,并将各个频道的所有单词合并在一起。
我一直在跟一个叫Srce Cde的人一起学习本教程,他创建了这样的lambda,但是使用扬声器分离而不是通道分离。
这是针对我正在使用AWS进行的音频分析项目。呼叫被上传到S3存储桶,lambda启动“转录”作业,“转录”输出进入另一个S3存储桶。然后,此S3存储桶会触发最终的lambda,该lambda会将通道与通话分开。
我尝试了以下代码:
import json
import boto3
def lambda_handler(event, context):
if event:
s3 = boto3.client("s3")
s3_object = event["Records"][0]["s3"]
bucket_name = s3_object["bucket"]["name"]
file_name = s3_object["object"]["key"]
file_obj = s3.get_object(Bucket=bucket_name, Key=file_name)
transcript_result = json.loads(file_obj["Body"].read())
segments = transcript_result["results"]["channel_labels"]
items = transcript_result["results"]["items"]
speaker_text = []
flag = False
speaker_json = {}
for no_of_speaker in range(segments["channels"]):
for word in items:
for seg in segments["items"]:
if seg["channel_label"] == "ch_"+str(no_of_speaker):
end_time = seg["end_time"]
if "start_time" in word:
if seg["items"]:
for seg_item in seg["items"]:
if word["end_time"] == seg_item["end_time"] and word["start_time"] == seg_item["start_time"]:
speaker_text.append(word["alternatives"][0]["content"])
flag = True
elif word["type"] == "punctuation":
if flag and speaker_text:
temp = speaker_text[-1]
temp += word["alternatives"][0]["content"]
speaker_text[-1] = temp
flag = False
break
speaker_json["ch_"+str(no_of_speaker)] = ' '.join(speaker_text)
speaker_text = []
print(speaker_json)
s3.put_object(Bucket="aws-mrp-speaker-separation", Key=file_name, Body=json.dumps(speaker_json))
return {
'statusCode': 200,
'body': json.dumps('Speaker transcript seperated successfully!')
}
我期望输出将成绩单按通道划分。但是,出现以下错误:
Traceback (most recent call last):
File "/var/task/lambda_function.py", line 19, in lambda_handler
for no_of_speaker in range(segments["channels"]):```