Question

我正在使用Pyro4开发一个分布式kNN分类器，并通过我的本地WiFi网络设置了客户端-服务器。服务器拥有数据集，并且将它们共享给拥有分类程序的两个客户端。我有两台计算机可供使用，服务器在计算机A中运行，而客户端在计算机A和计算机B上运行。

服务器启动后，它将等待两个客户端连接。一旦两个客户端将自己注册到服务器，服务器就会将数据集划分为两个分区（每个客户端一个），然后告诉两个客户端使用其自己的数据集份额开始分类。服务器通过在网络上远程调用客户端的classify（）方法来实现此目的。

在单台机器上完成分类时，只需 0.362884 s 即可对大小为500的数据集进行分类。由于我的分布式设置，我希望将分类时间减少一半，因为有两台机器同时进行工作。但是，令我惊讶的是，它花了 2.284536 s 来对运行服务器的计算机上相同大小的数据集进行分类，而在我的服务器上甚至花了更长的时间（ 3.029238 s ）本地网络上的其他计算机。

瓶颈在哪里？如果我的理解是正确的，则除了（1）服务器使用其份额（之前分类）远程调用classify（）方法，以及（2）客户端返回时，不应进行任何远程通信。分类完成后（在分类之后的），将预测结果发送给服务器。当然，分类期间上的网络上的机器之间应该没有远程通信，以免造成瓶颈？

服务器代码（显示远程调用分类）

# Tell clients to initiate classification
def initiateClassification(self):
    with Pyro4.locateNS() as ns:
        print('Expected number of clients met; will now initiate classification for all clients.')

        # Compile all client classification threads
        clientThreads = []

        # Invoke each client's classifiers
        for classifier, classifierUri in ns.list(prefix=self._classifierName).items():
            print('Initiating classifier of client ' + classifier + ' with URI ' + classifierUri + '...', end='')

            # Get the specific client's classifier object
            clientClassifier = Pyro4.Proxy(classifierUri)

            # Derive the id from its classifier name
            id = classifier.replace('client.classifier-', '')

            # Prepare the split of the dataset for the client
            trainSet, testSet = self.getShares(id)

            # REMOTE INVOCATION SECTION #
            # clientClassifier.classify is the remote kNN classification method in the client
            # We remotely invoke it, providing their dataset splits
            # It doesn't contain any remote invocations itself, aside from when it submits the predictions to the server when it's finished classifying
            # We run it in a new thread to avoid blocking the server
            clientThread = Thread(
                name='thread-' + clientClassifier.clientName,
                target=clientClassifier.classify,
                args=(trainSet, testSet)
            )
            # END REMOTE INVOCATION SECTION #

            clientThread.start()

            # Take note of this thread
            clientThreads.append(clientThread)

            print('initiated.')

        # Wait until all client threads have finished (which means they're done classifying)
        for clientThread in clientThreads:
            clientThread.join()

        # Stop each client
        for classifier, classifierUri in ns.list(prefix=self._classifierName).items():
            print('Stopping client with classifier URI ' + classifierUri + '...', end='')

            # Get the client's classifier object
            clientClassifier = Pyro4.Proxy(classifierUri)

            # Stop that client
            clientClassifier.stop()

            print('stopped.')

附录不必为要运行的每个客户端分类程序启动新线程，我只需将客户端的classify（）方法标记为单向（使用@ Pyro4.oneway）。我没有尝试过，但速度方面并没有任何区别。

TL; DR：因为我考虑了所有通信开销，所以避免在网络上执行适当的分类。我将交流设计为两次工作：在分类前后，仅此而已。但是由于某种原因，通过网络调用时分类仍然比在我本地在机器上本地运行时要慢很多。

通过网络调用时Pyro4方法非常慢

0 个答案: