Question

我已经使用https://github.com/michaelfairley/mincemeatpy/zipball/v0.1.2

中的示例下载了mincemeat.py

example.py如下：

#!/usr/bin/env python
import mincemeat

    data = ["Humpty Dumpty sat on a wall",
            "Humpty Dumpty had a great fall",
            "All the King's horses and all the King's men",
            "Couldn't put Humpty together again",
           ]

datasource = dict(enumerate(data))

def mapfn(k, v):
    for w in v.split():
        yield w, 1

def reducefn(k, vs):
    result = sum(vs)
    return result

s = mincemeat.Server()
s.datasource = datasource
s.mapfn = mapfn
s.reducefn = reducefn

results = s.run_server(password="changeme")
print results

用于单词计数程序。

我已通过LAN连接网络中的两台计算机。我使用一台计算机作为服务器并在其上运行example.py;在作为客户端的第二台计算机上，我使用以下命令行语句运行mincemeat.py：

python mincemeat.py -p changeme server-IP

工作正常。

现在我已经通过路由器连接了LAN中的3台计算机。然后一台机器作为服务器运行，我想在其上运行example.py，并将剩余的两台机器作为客户机运行。

我想将任务分发给我的两台客户机。那么分配地图任务并减少到两台计算机的过程是什么？如何将我在example.py中定义的任务分别分配给具有唯一IP的两台客户端计算机？

Answer 1

默认示例几乎不包含50个单词。因此，当您切换窗口以启动第二个客户端时，第一个客户端已完成处理文本。相反，使用大文本文件运行相同，您可以添加第二个客户端。以下应该有效。在这个例子中，我使用Project Gutenberg的小说Ulyesses（~1.5 MB）的纯文本格式。

在我的机器（Intel Xeon @ 3.10 GHz）中，2个客户端的时间不到30秒。因此，使用更大的文件或文件列表或快速启动第二个客户端。

#!/usr/bin/env python
import mincemeat

def file_contents(file_name):
    f = open(file_name)
    try:
        return f.read()
    finally:
        f.close()

novel_name = 'Ulysses.txt'

# The data source can be any dictionary-like object
datasource = {novel_name:file_contents(novel_name)}

def mapfn(k, v):
    for w in v.split():
        yield w, 1

def reducefn(k, vs):
    result = sum(vs)
    return result

s = mincemeat.Server()
s.datasource = datasource
s.mapfn = mapfn
s.reducefn = reducefn

results = s.run_server(password="changeme")
print results

对于文件目录，请使用以下示例。转储文件夹textfiles中的所有文本文件。

#!/usr/bin/env python
import mincemeat
import glob

all_files = glob.glob('textfiles/*.txt')

def file_contents(file_name):
    f = open(file_name)
    try:
        return f.read()
    finally:
        f.close()

# The data source can be any dictionary-like object
datasource = dict((file_name, file_contents(file_name))
                  for file_name in all_files)

def mapfn(k, v):
    for w in v.split():
        yield w, 1

def reducefn(k, vs):
    result = sum(vs)
    return result

s = mincemeat.Server()
s.datasource = datasource
s.mapfn = mapfn
s.reducefn = reducefn

results = s.run_server(password="changeme")
print results

如何使用mincemeat将example.py中定义的任务分发给两台客户端计算机？

1 个答案: