我有一个要求,我需要使用MapReduce将语音转换为使用.wav音频文件的文本。 我搜索了很多,并遇到了一些java和python库,可以帮助我将语音转换为文本。 python中的一个这样的库是pocketsphinx。
使用此库我可以将语音转换为文本。现在我尝试编写python MapReduce来使用这个库做同样的事情,但我迷失在中间。我知道我必须编写自定义记录阅读器来阅读我的音频文件。下面是我编辑和试过的代码。
render(request,'index.html,{'list1':list1,'form':SomethingForm()})
无法识别输入。这是我第一次尝试在python中编写mapreduce代码,所以我知道我错过了许多重要的观点。任何帮助或指导都会有所帮助,因为我陷入了困境。提前谢谢。
以下是我得到的错误日志。
#!/usr/bin/env python
import speech_recognition as sr
import sys
data = read_input(sys.stdin)
r = sr.Recognizer()
with sr.AudioFile(data) as source:
audio = r.record(source) # read the entire audio file
print '%s\t%s' % (1,"Sphinx thinks you said " + r.recognize_sphinx(audio))
我正在使用以下脚本运行工作
16/06/28 08:15:23 WARN streaming.StreamJob: -file option is deprecated, please use generic option -files instead.
packageJobJar: [/opt/python_mapreduce/audio_failure.py] [/usr/hdp/2.4.0.0-169/hadoop-mapreduce/hadoop-streaming-2.7.1.2.4.0.0-169.jar] /tmp/streamjob3856155156594950356.jar tmpDir=null
16/06/28 08:15:30 INFO impl.TimelineClientImpl: Timeline service address: http://sandbox.hortonworks.com:8188/ws/v1/timeline/
16/06/28 08:15:31 INFO client.RMProxy: Connecting to ResourceManager at sandbox.hortonworks.com/192.168.56.102:8050
16/06/28 08:15:33 INFO impl.TimelineClientImpl: Timeline service address: http://sandbox.hortonworks.com:8188/ws/v1/timeline/
16/06/28 08:15:33 INFO client.RMProxy: Connecting to ResourceManager at sandbox.hortonworks.com/192.168.56.102:8050
16/06/28 08:15:35 INFO mapred.FileInputFormat: Total input paths to process : 1
16/06/28 08:15:35 INFO mapreduce.JobSubmitter: number of splits:2
16/06/28 08:15:36 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1467091124614_0002
16/06/28 08:15:37 INFO impl.YarnClientImpl: Submitted application application_1467091124614_0002
16/06/28 08:15:37 INFO mapreduce.Job: The url to track the job: http://sandbox.hortonworks.com:8088/proxy/application_1467091124614_0002/
16/06/28 08:15:37 INFO mapreduce.Job: Running job: job_1467091124614_0002
16/06/28 08:16:02 INFO mapreduce.Job: Job job_1467091124614_0002 running in uber mode : false
16/06/28 08:16:02 INFO mapreduce.Job: map 0% reduce 0%
16/06/28 08:16:25 INFO mapreduce.Job: map 100% reduce 0%
16/06/28 08:16:25 INFO mapreduce.Job: Task Id : attempt_1467091124614_0002_m_000001_0, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:322)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:535)
at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
16/06/28 08:16:25 INFO mapreduce.Job: Task Id : attempt_1467091124614_0002_m_000000_0, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:322)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:535)
at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
16/06/28 08:16:26 INFO mapreduce.Job: map 0% reduce 0%
16/06/28 08:16:42 INFO mapreduce.Job: Task Id : attempt_1467091124614_0002_m_000001_1, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:322)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:535)
at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
16/06/28 08:16:43 INFO mapreduce.Job: Task Id : attempt_1467091124614_0002_m_000000_1, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:322)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:535)
at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
16/06/28 08:17:02 INFO mapreduce.Job: Task Id : attempt_1467091124614_0002_m_000001_2, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:322)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:535)
at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
16/06/28 08:17:04 INFO mapreduce.Job: Task Id : attempt_1467091124614_0002_m_000000_2, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:322)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:535)
at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
16/06/28 08:17:26 INFO mapreduce.Job: map 100% reduce 100%
16/06/28 08:17:27 INFO mapreduce.Job: Job job_1467091124614_0002 failed with state FAILED due to: Task failed task_1467091124614_0002_m_000001
Job failed as tasks failed. failedMaps:1 failedReduces:0
16/06/28 08:17:27 INFO mapreduce.Job: Counters: 17
Job Counters
Failed map tasks=7
Killed map tasks=1
Killed reduce tasks=1
Launched map tasks=8
Other local map tasks=6
Data-local map tasks=2
Total time spent by all maps in occupied slots (ms)=146997
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=146997
Total time spent by all reduce tasks (ms)=0
Total vcore-seconds taken by all map tasks=146997
Total vcore-seconds taken by all reduce tasks=0
Total megabyte-seconds taken by all map tasks=36749250
Total megabyte-seconds taken by all reduce tasks=0
Map-Reduce Framework
CPU time spent (ms)=0
Physical memory (bytes) snapshot=0
Virtual memory (bytes) snapshot=0
16/06/28 08:17:27 ERROR streaming.StreamJob: Job not successful!
Streaming Command Failed!
甚至通过将减速器的数量设置为0来尝试这个。
hadoop jar /usr/hdp/2.4.0.0-169/hadoop-mapreduce/hadoop-streaming.jar -file /opt/python_mapreduce/audio_failure.py -mapper /opt/python_mapreduce/audio_failure.py -input /python/audiofile/* -output /python/count/19/