错误:找不到与pandas == 1.0.3匹配的分布(来自modin)

时间:2020-07-16 12:58:21

标签: python pandas parallel-processing ray modin

我正在尝试使用modin库的并行处理来加快代码的速度。

我尝试使用Windows 10计算机上的dask引擎来执行此操作,但是它没有用,我认为这是因为它仍在开发中。 我读到您无法在Windows上使用ray引擎,因此我正在运行一个简单的示例来检查该库如何在免费的AWS Ubuntu服务器上工作。

当我成功安装modinray软件包后尝试安装pandas软件包时,出现以下错误:

ERROR: Could not find a version that satisfies the requirement pandas==1.0.3 (from versions: 0.1, 0.2b0, 0.2b1, 0.2, 0.3.0b0, 0.3.0b2, 0.3.0, 0.4.0, 0.4.1, 0.4.2, 0.4.3, 0.5.0, 0.6.0, 0.6.1, 0.7.0rc1, 0.7.0, 0.7.1, 0.7.2, 0.7.3, 0.8.0rc1, 0.8.0rc2, 0.8.0, 0.8.1, 0.9.0, 0.9.1, 0.10.0, 0.10.1, 0.11.0, 0.12.0, 0.13.0, 0.13.1, 0.14.0, 0.14.1, 0.15.0, 0.15.1, 0.15.2, 0.16.0, 0.16.1, 0.16.2, 0.17.0, 0.17.1, 0.18.0, 0.18.1, 0.19.0rc1, 0.19.0, 0.19.1, 0.19.2, 0.20.0rc1, 0.20.0, 0.20.1, 0.20.2, 0.20.3, 0.21.0rc1, 0.21.0, 0.21.1, 0.22.0, 0.23.0rc2, 0.23.0, 0.23.1, 0.23.2, 0.23.3, 0.23.4, 0.24.0rc1, 0.24.0, 0.24.1, 0.24.2)
ERROR: No matching distribution found for pandas==1.0.3

如果我在终端机pip3 install -vvv modin上键入以获取我得到的日志:

Exception information:
Traceback (most recent call last):
  File "/home/ubuntu/.local/lib/python3.5/site-packages/pip/_internal/cli/base_command.py", line 188, in _main
    status = self.run(options, args)
  File "/home/ubuntu/.local/lib/python3.5/site-packages/pip/_internal/cli/req_command.py", line 185, in wrapper
    return func(self, options, args)
  File "/home/ubuntu/.local/lib/python3.5/site-packages/pip/_internal/commands/install.py", line 333, in run
    reqs, check_supported_wheels=not options.target_dir
  File "/home/ubuntu/.local/lib/python3.5/site-packages/pip/_internal/resolution/legacy/resolver.py", line 179, in resolve
    discovered_reqs.extend(self._resolve_one(requirement_set, req))
  File "/home/ubuntu/.local/lib/python3.5/site-packages/pip/_internal/resolution/legacy/resolver.py", line 362, in _resolve_one
    abstract_dist = self._get_abstract_dist_for(req_to_install)
  File "/home/ubuntu/.local/lib/python3.5/site-packages/pip/_internal/resolution/legacy/resolver.py", line 313, in _get_abstract_dist_for
    self._populate_link(req)
  File "/home/ubuntu/.local/lib/python3.5/site-packages/pip/_internal/resolution/legacy/resolver.py", line 279, in _populate_link
    req.link = self.finder.find_requirement(req, upgrade)
  File "/home/ubuntu/.local/lib/python3.5/site-packages/pip/_internal/index/package_finder.py", line 930, in find_requirement
    req)
pip._internal.exceptions.DistributionNotFound: No matching distribution found for pandas==1.0.3 (from modin)
Removed build tracker: '/tmp/pip-req-tracker-oklngevc'

我该如何解决这个问题?

我要运行以检查其工作原理的脚本是:

import os
os.environ["MODIN_ENGINE"] = "ray"  # Modin will use Ray
import modin.pandas as pd
import time
import pandas as pn

start_time = time.time()
datos = pd.read_csv('datospruebaAWS.csv', header=None, index_col=0)
end_time = time.time()
print("time read csv parallel=", end_time - start_time)

start_time = time.time()
datos = pn.read_csv('datospruebaAWS.csv', header=None, index_col=0)
end_time = time.time()
print("time read csv=", end_time - start_time)

和我想加速的脚本之一,只是将import pandas as pd更改为import modin.pandas as pd是:

import pandas as pd
import glob
import time

extension = 'csv'
all_filenames = [i for i in glob.glob('*.{}'.format(extension))]

start_time = time.time()
cookies = []
for file in all_filenames:
    datos = pd.read_csv(file, header=None, index_col=0)
    datos.index.name = 'CookieID'
    print('leido')
    for i in range(len(datos)):
        if datos[2].iloc[i].find('golf') != -1:
            cookies.append(datos.index[i])
    print('cookies')
    print(len(cookies))
    del datos

end_time = time.time()
print("time=", end_time - start_time)

cookies = pd.Series(cookies)
cookies = cookies.unique()
cookies = pd.DataFrame(cookies)
cookies['Owner ID'] = ['Les gusta el golf']*len(cookies)
cookies.to_csv('DMP_golf.txt', header=False, index=False, sep='\t')

因为该文件夹包含许多大型的csv文件,并且需要数小时才能找到解决方案。

还有,还有其他方法可以加快此代码的速度吗?

1 个答案:

答案 0 :(得分:1)

看起来像Pandas 1.0.3不支持您正在使用的Python 3.5。请参阅https://pypi.org/project/pandas/1.0.3/#files中的“版本”列。