将音频文件转换为文本

时间:2016-06-28 07:40:30

标签: python hadoop audio mapreduce

我有一个要求,我需要使用MapReduce将语音转换为使用.wav音频文件的文本。 我搜索了很多,并遇到了一些java和python库,可以帮助我将语音转换为文本。 python中的一个这样的库是pocketsphinx

使用此库我可以将语音转换为文本。现在我尝试编写python MapReduce来使用这个库做同样的事情,但我迷失在中间。我知道我必须编写自定义记录阅读器来阅读我的音频文件。下面是我编辑和试过的代码。

render(request,'index.html,{'list1':list1,'form':SomethingForm()})

无法识别输入。这是我第一次尝试在python中编写mapreduce代码,所以我知道我错过了许多重要的观点。任何帮助或指导都会有所帮助,因为我陷入了困境。提前谢谢。

以下是我得到的错误日志。

#!/usr/bin/env python

import speech_recognition as sr
import sys


data = read_input(sys.stdin)


r = sr.Recognizer()
with sr.AudioFile(data) as source:
    audio = r.record(source) # read the entire audio file


    print '%s\t%s' % (1,"Sphinx thinks you said " + r.recognize_sphinx(audio))

我正在使用以下脚本运行工作

16/06/28 08:15:23 WARN streaming.StreamJob: -file option is deprecated, please use generic option -files instead.
packageJobJar: [/opt/python_mapreduce/audio_failure.py] [/usr/hdp/2.4.0.0-169/hadoop-mapreduce/hadoop-streaming-2.7.1.2.4.0.0-169.jar] /tmp/streamjob3856155156594950356.jar tmpDir=null
16/06/28 08:15:30 INFO impl.TimelineClientImpl: Timeline service address: http://sandbox.hortonworks.com:8188/ws/v1/timeline/
16/06/28 08:15:31 INFO client.RMProxy: Connecting to ResourceManager at sandbox.hortonworks.com/192.168.56.102:8050
16/06/28 08:15:33 INFO impl.TimelineClientImpl: Timeline service address: http://sandbox.hortonworks.com:8188/ws/v1/timeline/
16/06/28 08:15:33 INFO client.RMProxy: Connecting to ResourceManager at sandbox.hortonworks.com/192.168.56.102:8050
16/06/28 08:15:35 INFO mapred.FileInputFormat: Total input paths to process : 1
16/06/28 08:15:35 INFO mapreduce.JobSubmitter: number of splits:2
16/06/28 08:15:36 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1467091124614_0002
16/06/28 08:15:37 INFO impl.YarnClientImpl: Submitted application application_1467091124614_0002
16/06/28 08:15:37 INFO mapreduce.Job: The url to track the job: http://sandbox.hortonworks.com:8088/proxy/application_1467091124614_0002/
16/06/28 08:15:37 INFO mapreduce.Job: Running job: job_1467091124614_0002
16/06/28 08:16:02 INFO mapreduce.Job: Job job_1467091124614_0002 running in uber mode : false
16/06/28 08:16:02 INFO mapreduce.Job:  map 0% reduce 0%
16/06/28 08:16:25 INFO mapreduce.Job:  map 100% reduce 0%
16/06/28 08:16:25 INFO mapreduce.Job: Task Id : attempt_1467091124614_0002_m_000001_0, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
        at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:322)
        at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:535)
        at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
        at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)

Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143

16/06/28 08:16:25 INFO mapreduce.Job: Task Id : attempt_1467091124614_0002_m_000000_0, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
        at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:322)
        at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:535)
        at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
        at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)

Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143

16/06/28 08:16:26 INFO mapreduce.Job:  map 0% reduce 0%
16/06/28 08:16:42 INFO mapreduce.Job: Task Id : attempt_1467091124614_0002_m_000001_1, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
        at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:322)
        at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:535)
        at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
        at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)

16/06/28 08:16:43 INFO mapreduce.Job: Task Id : attempt_1467091124614_0002_m_000000_1, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
        at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:322)
        at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:535)
        at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
        at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)

16/06/28 08:17:02 INFO mapreduce.Job: Task Id : attempt_1467091124614_0002_m_000001_2, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
        at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:322)
        at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:535)
        at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
        at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)

16/06/28 08:17:04 INFO mapreduce.Job: Task Id : attempt_1467091124614_0002_m_000000_2, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
        at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:322)
        at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:535)
        at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
        at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)

16/06/28 08:17:26 INFO mapreduce.Job:  map 100% reduce 100%
16/06/28 08:17:27 INFO mapreduce.Job: Job job_1467091124614_0002 failed with state FAILED due to: Task failed task_1467091124614_0002_m_000001
Job failed as tasks failed. failedMaps:1 failedReduces:0

16/06/28 08:17:27 INFO mapreduce.Job: Counters: 17
        Job Counters
                Failed map tasks=7
                Killed map tasks=1
                Killed reduce tasks=1
                Launched map tasks=8
                Other local map tasks=6
                Data-local map tasks=2
                Total time spent by all maps in occupied slots (ms)=146997
                Total time spent by all reduces in occupied slots (ms)=0
                Total time spent by all map tasks (ms)=146997
                Total time spent by all reduce tasks (ms)=0
                Total vcore-seconds taken by all map tasks=146997
                Total vcore-seconds taken by all reduce tasks=0
                Total megabyte-seconds taken by all map tasks=36749250
                Total megabyte-seconds taken by all reduce tasks=0
        Map-Reduce Framework
                CPU time spent (ms)=0
                Physical memory (bytes) snapshot=0
                Virtual memory (bytes) snapshot=0
16/06/28 08:17:27 ERROR streaming.StreamJob: Job not successful!
Streaming Command Failed!

甚至通过将减速器的数量设置为0来尝试这个。

hadoop jar /usr/hdp/2.4.0.0-169/hadoop-mapreduce/hadoop-streaming.jar -file /opt/python_mapreduce/audio_failure.py    -mapper /opt/python_mapreduce/audio_failure.py -input  /python/audiofile/* -output /python/count/19/

0 个答案:

没有答案