Question

我正在尝试在Python中编写一个基本字数MapReduce。这是映射器代码：

#!/usr/bin/env python

import sys
# input comes from STDIN (standard input)
for line in sys.stdin:

    try:
        # remove leading and trailing whitespace
        line = line.strip()
        # split the line into words
        words = line.split()
        # loop over words
        for word in words:
        # write out word and trivial count
            print '%s\t%s' % (word.strip(), 1)
    except:
        pass

我在Project Guttenberg的Ulysses上运行。

当我在Hadoop集群上运行它时，收到以下错误消息：

    File "<stdin>", line 1
    The Project Gutenberg EBook of Ulysses, by James Joyce
              ^
SyntaxError: invalid syntax

我没有得到什么问题，有什么帮助吗？

Answer 1

看起来您可能正在尝试将该书作为Python文件运行。也许你正在以错误的顺序传递论据。

Answer 2

哦，你在运行Python 3吗？

Python 3改变了print的语法，需要print(...)

另外，您可以像.format()这样使用：

可能的答案

print('{word}\t{value}'.format(word=word.strip(), value=1))

可以简化为：print('{}\t{}'.format(word.strip(), 1))

ALSO 如果你有一条“线”，如“詹姆斯·乔伊斯的尤利西斯项目古腾堡电子书”

您可能还想删除,的;）

Python字数统计MapReduce读取stdin时的错误

2 个答案: