我正在尝试使用modin
库的并行处理来加快代码的速度。
我尝试使用Windows 10计算机上的dask引擎来执行此操作,但是它没有用,我认为这是因为它仍在开发中。 我读到您无法在Windows上使用ray引擎,因此我正在运行一个简单的示例来检查该库如何在免费的AWS Ubuntu服务器上工作。
当我成功安装modin
和ray
软件包后尝试安装pandas
软件包时,出现以下错误:
ERROR: Could not find a version that satisfies the requirement pandas==1.0.3 (from versions: 0.1, 0.2b0, 0.2b1, 0.2, 0.3.0b0, 0.3.0b2, 0.3.0, 0.4.0, 0.4.1, 0.4.2, 0.4.3, 0.5.0, 0.6.0, 0.6.1, 0.7.0rc1, 0.7.0, 0.7.1, 0.7.2, 0.7.3, 0.8.0rc1, 0.8.0rc2, 0.8.0, 0.8.1, 0.9.0, 0.9.1, 0.10.0, 0.10.1, 0.11.0, 0.12.0, 0.13.0, 0.13.1, 0.14.0, 0.14.1, 0.15.0, 0.15.1, 0.15.2, 0.16.0, 0.16.1, 0.16.2, 0.17.0, 0.17.1, 0.18.0, 0.18.1, 0.19.0rc1, 0.19.0, 0.19.1, 0.19.2, 0.20.0rc1, 0.20.0, 0.20.1, 0.20.2, 0.20.3, 0.21.0rc1, 0.21.0, 0.21.1, 0.22.0, 0.23.0rc2, 0.23.0, 0.23.1, 0.23.2, 0.23.3, 0.23.4, 0.24.0rc1, 0.24.0, 0.24.1, 0.24.2)
ERROR: No matching distribution found for pandas==1.0.3
如果我在终端机pip3 install -vvv modin
上键入以获取我得到的日志:
Exception information:
Traceback (most recent call last):
File "/home/ubuntu/.local/lib/python3.5/site-packages/pip/_internal/cli/base_command.py", line 188, in _main
status = self.run(options, args)
File "/home/ubuntu/.local/lib/python3.5/site-packages/pip/_internal/cli/req_command.py", line 185, in wrapper
return func(self, options, args)
File "/home/ubuntu/.local/lib/python3.5/site-packages/pip/_internal/commands/install.py", line 333, in run
reqs, check_supported_wheels=not options.target_dir
File "/home/ubuntu/.local/lib/python3.5/site-packages/pip/_internal/resolution/legacy/resolver.py", line 179, in resolve
discovered_reqs.extend(self._resolve_one(requirement_set, req))
File "/home/ubuntu/.local/lib/python3.5/site-packages/pip/_internal/resolution/legacy/resolver.py", line 362, in _resolve_one
abstract_dist = self._get_abstract_dist_for(req_to_install)
File "/home/ubuntu/.local/lib/python3.5/site-packages/pip/_internal/resolution/legacy/resolver.py", line 313, in _get_abstract_dist_for
self._populate_link(req)
File "/home/ubuntu/.local/lib/python3.5/site-packages/pip/_internal/resolution/legacy/resolver.py", line 279, in _populate_link
req.link = self.finder.find_requirement(req, upgrade)
File "/home/ubuntu/.local/lib/python3.5/site-packages/pip/_internal/index/package_finder.py", line 930, in find_requirement
req)
pip._internal.exceptions.DistributionNotFound: No matching distribution found for pandas==1.0.3 (from modin)
Removed build tracker: '/tmp/pip-req-tracker-oklngevc'
我该如何解决这个问题?
我要运行以检查其工作原理的脚本是:
import os
os.environ["MODIN_ENGINE"] = "ray" # Modin will use Ray
import modin.pandas as pd
import time
import pandas as pn
start_time = time.time()
datos = pd.read_csv('datospruebaAWS.csv', header=None, index_col=0)
end_time = time.time()
print("time read csv parallel=", end_time - start_time)
start_time = time.time()
datos = pn.read_csv('datospruebaAWS.csv', header=None, index_col=0)
end_time = time.time()
print("time read csv=", end_time - start_time)
和我想加速的脚本之一,只是将import pandas as pd
更改为import modin.pandas as pd
是:
import pandas as pd
import glob
import time
extension = 'csv'
all_filenames = [i for i in glob.glob('*.{}'.format(extension))]
start_time = time.time()
cookies = []
for file in all_filenames:
datos = pd.read_csv(file, header=None, index_col=0)
datos.index.name = 'CookieID'
print('leido')
for i in range(len(datos)):
if datos[2].iloc[i].find('golf') != -1:
cookies.append(datos.index[i])
print('cookies')
print(len(cookies))
del datos
end_time = time.time()
print("time=", end_time - start_time)
cookies = pd.Series(cookies)
cookies = cookies.unique()
cookies = pd.DataFrame(cookies)
cookies['Owner ID'] = ['Les gusta el golf']*len(cookies)
cookies.to_csv('DMP_golf.txt', header=False, index=False, sep='\t')
因为该文件夹包含许多大型的csv文件,并且需要数小时才能找到解决方案。
还有,还有其他方法可以加快此代码的速度吗?
答案 0 :(得分:1)
看起来像Pandas 1.0.3不支持您正在使用的Python 3.5。请参阅https://pypi.org/project/pandas/1.0.3/#files中的“版本”列。