在我的reducer类

时间:2017-04-18 21:49:10

标签: python mapreduce

我知道这是一个类似的问题。我已经完成了一些答案而没有工作。所以这就是问题,我正在为MapReduce程序编写mapper和reducer,我收到以下错误

  

Traceback(最近一次调用最后一次):文件   " / usr / local / hadoop /./ reducer.py" ;,第10行,in       desc,count = line.split(' \ t',1)ValueError:需要多于1个值来解包

我无法调试错误,因为我不知道导致问题的原因。请在下面找到我的Mapper和Reducer类的代码。

映射器代码:

#!/usr/bin/env python
import sys
for line in sys.stdin:
    line = line.strip('')
    bYear = line.split(',')
    for birthYear in bYear:
        print '%s\t%s' % (bYear[6],1)

减速机代码:

#!/usr/bin/env python
import sys

current_desc = None
current_count = 0
desc = None

for line in sys.stdin:
    line = line.strip()
    **desc, count = line.split('\t', 1)** . ---> This is where I'm getting an error.
    try:
        count = int(count)
    except ValueError:
        continue

    if current_desc == desc:
        current_count += count
    else:
        if current_desc:
            # write result to STDOUT
            print '%s\t%s' % (current_desc, current_count)
        current_count = count
        current_desc = desc

if current_desc == desc:
    print '%s\t%s' % (current_desc, current_count)

请帮忙。

1 个答案:

答案 0 :(得分:0)

似乎没有&#39; \ t&#39;该特定行中的字符,因此grid = np.random.rand(40,2) full = pd.DataFrame(grid, columns=['value']) def percentile(x, df): if int(x.name)<20: pass else: df_temp = df.loc[(int(x.name)-20):int(x.name),'value'] bucketted = [b for b in df_temp.value if b < df_temp.loc[int(x.name), 'value']] return len(bucketted)/0.2 full['percentile'] = full.apply(percentile, axis=1, args=(full,)) 仅返回1个元素,无法分配给line.split('\t', 1)