我正在尝试在RStudio 0.99.484和Hadoop-2.3.0(Windows版本)中使用MRkmeans。使用一个文件(包含755 * 1682个实际值,大小为21 MB)作为输入数据,它已成功完成,但使用另一个文件(包含4832 * 3952个实际值,大小为317 MB),我有一些错误, map-reduce失败,所有MR过程和错误都显示如下。 我的问题解决了如果我们在rmr.options(backend.parameters)中使用更大的尺寸?如果是的话,我需要一个示例代码。
rmr: DEPRECATED: Please use 'rm -r' instead.
rmr: `/Users/SETUPC~1/AppData/Local/Temp/RtmpQ9MVgC/file10f06a465c65': No such file or directory
rmr: DEPRECATED: Please use 'rm -r' instead.
rmr: `/Users/SETUPC~1/AppData/Local/Temp/RtmpQ9MVgC/file10f0634072aa': No such file or directory
15/10/19 21:49:56 WARN zlib.ZlibFactory: Failed to load/initialize native- zlib library
15/10/19 21:49:56 INFO compress.CodecPool: Got brand-new compressor [.deflate]
packageJobJar: [/C:/tmp/hadoop-Koohi/hadoop-unjar740024213403447693/] [] C:\Users\SETUPC~1\AppData\Local\Temp\streamjob2283559356588490466.jar tmpDir=null
15/10/19 21:54:03 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
15/10/19 21:54:03 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
15/10/19 21:54:12 INFO mapred.FileInputFormat: Total input paths to process : 1
15/10/19 21:54:13 INFO mapreduce.JobSubmitter: number of splits:2
15/10/19 21:54:14 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1445275456322_0003
15/10/19 21:54:15 INFO impl.YarnClientImpl: Submitted application application_1445275456322_0003
15/10/19 21:54:15 INFO mapreduce.Job: The url to track the job: http://Hamidreza:8088/proxy/application_1445275456322_0003/
15/10/19 21:54:15 INFO mapreduce.Job: Running job: job_1445275456322_0003
15/10/19 21:54:34 INFO mapreduce.Job: Job job_1445275456322_0003 running in uber mode : false
15/10/19 21:54:34 INFO mapreduce.Job: map 0% reduce 0%
15/10/19 21:55:04 INFO mapreduce.Job: map 1% reduce 0%
15/10/19 21:56:07 INFO mapreduce.Job: map 9% reduce 0%
15/10/19 21:56:31 INFO mapreduce.Job: map 10% reduce 0%
15/10/19 21:56:41 INFO mapreduce.Job: map 11% reduce 0%
15/10/19 21:56:55 INFO mapreduce.Job: map 19% reduce 0%
15/10/19 21:56:58 INFO mapreduce.Job: map 20% reduce 0%
15/10/19 21:57:07 INFO mapreduce.Job: map 21% reduce 0%
15/10/19 21:57:19 INFO mapreduce.Job: map 26% reduce 0%
15/10/19 21:57:25 INFO mapreduce.Job: map 27% reduce 0%
15/10/19 21:57:28 INFO mapreduce.Job: map 31% reduce 0%
15/10/19 21:57:31 INFO mapreduce.Job: map 39% reduce 0%
15/10/19 21:57:34 INFO mapreduce.Job: map 46% reduce 0%
15/10/19 21:57:44 INFO mapreduce.Job: map 47% reduce 0%
15/10/19 21:57:47 INFO mapreduce.Job: map 50% reduce 0%
15/10/19 21:57:49 INFO mapreduce.Job: map 66% reduce 0%
15/10/19 21:57:50 INFO mapreduce.Job: map 67% reduce 0%
15/10/19 21:57:50 INFO mapreduce.Job: Task Id : attempt_1445275456322_0003_m_000000_0, Status : FAILED
Container [pid=container_1445275456322_0003_01_000002,containerID=container_1445275456322_0003_01_000002] is running beyond physical memory limits. Current usage: 1.1 GB of 1 GB physical memory used; 1.3 GB of 2.1 GB virtual memory used. Killing container.
Dump of the process-tree for container_1445275456322_0003_01_000002 :
|- PID CPU_TIME(MILLIS) VMEM(BYTES) WORKING_SET(BYTES)
|- 176 15 716800 2641920
|- 6680 17515 979025920 955031552
|- 5660 0 512000 1769472
|- 6288 31 1675264 2793472
|- 6976 11296 363868160 241926144
|- 2816 0 1736704 2416640
Container killed on request. Exit code is 137
Container exited with a non-zero exit code 137
15/10/19 21:57:51 INFO mapreduce.Job: map 17% reduce 0%
15/10/19 21:58:12 INFO mapreduce.Job: map 18% reduce 0%
15/10/19 21:58:13 INFO mapreduce.Job: map 22% reduce 0%
15/10/19 21:58:50 INFO mapreduce.Job: map 26% reduce 0%
15/10/19 21:58:55 INFO mapreduce.Job: map 31% reduce 0%
15/10/19 21:59:10 INFO mapreduce.Job: map 47% reduce 0%
15/10/19 21:59:11 INFO mapreduce.Job: map 51% reduce 0%
15/10/19 21:59:13 INFO mapreduce.Job: map 60% reduce 0%
15/10/19 21:59:17 INFO mapreduce.Job: map 63% reduce 0%
15/10/19 21:59:28 INFO mapreduce.Job: Task Id : attempt_1445275456322_0003_m_000000_1, Status : FAILED
Container [pid=container_1445275456322_0003_01_000004,containerID=container_1445275456322_0003_01_000004] is running beyond physical memory limits. Current usage: 1.2 GB of 1 GB physical memory used; 1.3 GB of 2.1 GB virtual memory used. Killing container.
Dump of the process-tree for container_1445275456322_0003_01_000004 :
|- PID CPU_TIME(MILLIS) VMEM(BYTES) WORKING_SET(BYTES)
|- 5420 0 716800 2641920
|- 1420 62 1671168 2785280
|- 5432 13531 375529472 302137344
|- 4016 15 507904 1765376
|- 4204 17125 971837440 951898112
|- 4208 15 1732608 2404352
Container killed on request. Exit code is 137
Container exited with a non-zero exit code 137
15/10/19 21:59:29 INFO mapreduce.Job: map 30% reduce 0%
15/10/19 21:59:35 INFO mapreduce.Job: map 33% reduce 0%
15/10/19 21:59:53 INFO mapreduce.Job: map 34% reduce 0%
15/10/19 21:59:56 INFO mapreduce.Job: map 50% reduce 0%
15/10/19 22:00:03 INFO mapreduce.Job: map 72% reduce 0%
15/10/19 22:00:06 INFO mapreduce.Job: map 83% reduce 0%
15/10/19 22:00:16 INFO mapreduce.Job: map 100% reduce 0%
15/10/19 22:00:16 INFO mapreduce.Job: Task Id : attempt_1445275456322_0003_m_000000_2, Status : FAILED
Container [pid=container_1445275456322_0003_01_000005,containerID=container_1445275456322_0003_01_000005] is running beyond physical memory limits. Current usage: 1.2 GB of 1 GB physical memory used; 1.3 GB of 2.1 GB virtual memory used. Killing container.
Dump of the process-tree for container_1445275456322_0003_01_000005 :
|- PID CPU_TIME(MILLIS) VMEM(BYTES) WORKING_SET(BYTES)
|- 5904 15 1732608 2412544
|- 6872 0 712704 2629632
|- 4664 14546 971898880 951922688
|- 3632 78 1667072 2785280
|- 6092 0 512000 1769472
|- 6924 13203 371974144 314916864
Container killed on request. Exit code is 137
Container exited with a non-zero exit code 137
15/10/19 22:00:17 INFO mapreduce.Job: map 50% reduce 0%
15/10/19 22:00:20 INFO mapreduce.Job: map 50% reduce 17%
15/10/19 22:00:27 INFO mapreduce.Job: map 76% reduce 17%
15/10/19 22:00:30 INFO mapreduce.Job: map 83% reduce 17%
15/10/19 22:00:38 INFO mapreduce.Job: map 100% reduce 17%
15/10/19 22:00:39 INFO mapreduce.Job: map 100% reduce 100%
15/10/19 22:00:41 INFO mapreduce.Job: Job job_1445275456322_0003 failed with state FAILED due to: Task failed task_1445275456322_0003_m_000000
Job failed as tasks failed. failedMaps:1 failedReduces:0
15/10/19 22:00:45 INFO mapreduce.Job: Counters: 40
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=79441152
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=63636256
HDFS: Number of bytes written=0
HDFS: Number of read operations=5
HDFS: Number of large read operations=0
HDFS: Number of write operations=0
Job Counters
Failed map tasks=4
Killed map tasks=1
Killed reduce tasks=1
Launched map tasks=6
Launched reduce tasks=1
Other local map tasks=4
Data-local map tasks=2
Total time spent by all maps in occupied slots (ms)=714657
Total time spent by all reduces in occupied slots (ms)=39170
Total time spent by all map tasks (ms)=714657
Total time spent by all reduce tasks (ms)=39170
Total vcore-seconds taken by all map tasks=714657
Total vcore-seconds taken by all reduce tasks=39170
Total megabyte-seconds taken by all map tasks=731808768
Total megabyte-seconds taken by all reduce tasks=40110080
Map-Reduce Framework
Map input records=78
Map output records=56
Map output bytes=79348969
Map output materialized bytes=79349223
Input split bytes=93
Combine input records=0
Spilled Records=56
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=4670
CPU time spent (ms)=161251
Physical memory (bytes) snapshot=373673984
Virtual memory (bytes) snapshot=395513856
Total committed heap usage (bytes)=306708480
File Input Format Counters
Bytes Read=63636163
15/10/19 22:00:45 ERROR streaming.StreamJob: Job not Successful!
Streaming Command Failed!
Error in mr(map = map, reduce = reduce, combine = combine, vectorized.reduce, :
hadoop streaming failed with error code 1 In addition: Warning message:
running command '/hadoop-2.3.0/bin/hadoop jar /hadoop- 2.3.0/share/hadoop/tools/lib/hadoop-streaming-2.3.0.jar -D "stream.map.input=typedbytes" -D "stream.map.output=typedbytes" -D "stream.reduce.input=typedbytes" -D "stream.reduce.output=typedbytes" -D "mapreduce.map.java.opts=-Xmx400M" -D "mapreduce.reduce.java.opts=-Xmx400M" -files "/Users/SETUPC~1/AppData/Local/Temp/RtmpQ9MVgC/rmr-local- env10f0780c2119,/Users/SETUPC~1/AppData/Local/Temp/RtmpQ9MVgC/rmr-global- env10f03b794070,/Users/SETUPC~1/AppData/Local/Temp/RtmpQ9MVgC/rmr-streaming- map10f06b4f59ee,/Users/SETUPC~1/AppData/Local/Temp/RtmpQ9MVgC/rmr-streaming-reduce10f054f5e9e" -input "/tmp/file10f08e55037" -output "/tmp/file10f03d086dcc" -mapper "Rscript --vanilla ./rmr-streaming- map10f06b4f59ee" -reducer "Rscript --vanilla ./rmr-streaming- reduce10f054f5e9e" -inputformat "org.apache.hadoop.streaming.AutoInputFormat" -outputformat "o [... truncated]
答案 0 :(得分:0)
如果你指的是软件包的tests目录中的文件,它并不是真的意味着这么广泛的数据,也不清楚你应该使用kmeans,当你有大量列的行时。如果你有k个中心,D维和P点,你就可以将kD参数拟合到P点。如果D和P的大小差不多,我认为这不是一个统计上合理的程序。即使我错了也是如此,数据按行划分。列数没有可扩展性。您需要研究不同的算法。目前尚不清楚目标数据的大小。 300MB并不是真正的mapreduce尺寸。这种内存问题通常会发生,因为每个容器都将其所有内存分配给一个java进程,并且R进程没有留下任何内存。请参阅帮助(“hadoop.settings”)。