今天我尝试在Linux中执行以下命令,我想测试hadoop中的Streaming接口,
cat test.txt|php wc_mapper.php|python Reducer.py
发生错误:
"Traceback (most recent call last):
File "Reducer.py", line 7, in <module>
word,count = line.split()
ValueError: need more than 0 values to unpack
"
test.txt的内容如下:
hello world
hello world
hello world
由PHP编写的wc_mapper.php的内容是
#!/usr/bin/php
<?php
error_reporting(E_ALL ^ E_NOTICE);
$word2count = array();
while (($line = fgets(STDIN)) !== false) {
$line = trim($line);
$words = preg_split('/\W/', $line, 0, PREG_SPLIT_NO_EMPTY);
foreach ($words as $word) {
echo $word, chr(9), "1", PHP_EOL;
}
}
?>
并且Python编写的Reducer.py的内容是
#!/usr/bin/python
from operator import itemgetter
import sys
word2count = {}
for line in sys.stdin:
line = line.strip()
word,count = line.split()
try:
count = int(count)
word2count[word] = word2count.get(word, 0) + count
except ValueError:
pass
sorted_word2count = sorted(word2count.items(), key=itemgetter(0))
for word,count in sorted_word2count:
print '%s\t%s'%(word,count)
谁知道错误的原因,如何解决这个问题? 当我执行第一部分命令
时cat test.txt|php wc_mapper.php|sort
,我得到了以下输出:
hello 1
hello 1
hello 1
world 1
world 1
world 1
第一行为null,但占用一行。
答案 0 :(得分:0)
在split()
函数
try:
word,count = line.split(" ")
except:
print("Error")
我已将单个空格作为分隔符。你可以改变。