How to make setuptools install a wheel containing multiple packages?

时间:2018-12-03 12:49:55

标签: python azure setuptools

Suppose this wheel:

M Filemode      Length  Date         Time      File
- ----------  --------  -----------  --------  -------------------------------------------
  -rw-rw-r--      1358  26-Sep-2018  21:08:40  azure/common/__init__.py
  -rw-rw-r--       327  26-Sep-2018  21:08:40  azure/common/_version.py
  -rw-rw-r--      8737  26-Sep-2018  21:08:40  azure/common/client_factory.py
  -rw-rw-r--       755  26-Sep-2018  21:08:40  azure/common/cloud.py
  -rw-rw-r--      2479  26-Sep-2018  21:08:40  azure/common/credentials.py
  -rw-rw-r--       805  26-Sep-2018  21:08:40  azure/common/exceptions.py
  -rw-rw-r--      6079  26-Sep-2018  21:08:40  azure/profiles/__init__.py
  -rw-rw-r--      3943  26-Sep-2018  21:08:40  azure/profiles/multiapiclient.py
  -rw-rw-r--         6  26-Sep-2018  21:21:54  azure_common-1.1.16.dist-info/top_level.txt
  -rw-rw-r--       110  26-Sep-2018  21:21:54  azure_common-1.1.16.dist-info/WHEEL
  -rw-rw-r--      3805  26-Sep-2018  21:21:54  azure_common-1.1.16.dist-info/METADATA
  -rw-rw-r--       997  26-Sep-2018  21:21:54  azure_common-1.1.16.dist-info/RECORD
- ----------  --------  -----------  --------  -------------------------------------------
                 29401                         12 files

It has three different packages in it:

  • azure.common
  • azure.profiles
  • azure_common

All great names, and great layout. Also, a lot of greatness of mind that unmistakably went into engineering this miracle of modern software engineering.

This wheel is distributed by the name azure-common. So, when you depend on in in setup.py like this:

setup(
    ...
    install_requires=['azure-common'],
    ...
)

You will only get azure_common package installed. Maybe. I don't really know, it seems so, but few times that I tried it seemed to only install azure.common, or maybe I eyeballed it... It's really hard to follow all the manipulations setuptools does on a package.

Hence the question: how can I force setuptools into installing all packages found in this kind of wheel? Also, the order is important because this garbage needs to be installed some of the times with other packages which also provide azure.something packages which may overwrite the stuff in azure directory. So, Ideally, I'd also like to control in which order install_requires dependencies are processed.


This is where this started: How to specify bracket dependencies in setup.py?

1 个答案:

答案 0 :(得分:0)

当通过#import statements from pyspark.ml import Pipeline from pyspark.ml.classification import RandomForestClassifier from pyspark.ml.feature import IndexToString, StringIndexer, VectorIndexer from pyspark.ml.evaluation import MulticlassClassificationEvaluator from pyspark.sql.functions import udf from pyspark.ml.linalg import Vectors from pyspark.ml.linalg import VectorUDT from pyspark.context import SparkContext from pyspark.sql.session import SparkSession #create spark session sc = SparkContext('local') spark = SparkSession(sc) #read Dataframe df=spark.read.parquet('JouID-UBTFIDFVectors-server22.parquet') print("DATAFRAME READ DONE") # index journal id field using StringIndexer to create label column on which random forest can be trained labelIndexer = StringIndexer(inputCol="journalid", outputCol="indexedLabel",handleInvalid="skip") labelsDF=labelIndexer.fit(df) df1=labelsDF.transform(df) print("LABEL INDEXING DONE") #Convert sparse feature vector into dense vector parse_ = udf(lambda l: Vectors.dense(l), VectorUDT()) df2 = df1.withColumn("featuresNew",parse_(df1["features"])).drop('features') #spilt train test (trainingData, testData) = df2.randomSplit([0.8, 0.2]) #Define random forest properties rf = RandomForestClassifier(labelCol="indexedLabel", featuresCol="featuresNew", numTrees=10) print("RF Training Strated") rfModel=rf.fit(trainingData) print("RF Training DONE") rfModel.save('rfModel-10Trees-UnigramBigramTFIDF-21Dec') df4=rfModel.transform(testData) print("RF Testing DONE") #convert indexed label back to journalid labelConverter = IndexToString(inputCol="prediction", outputCol="predictedLabel",labels=labelsDF.labels) df5=labelConverter.transform(df4) df5.select('journalid','predictedLabel').write.csv('JouID-PredictedLable-rf-21Dec.csv') evaluator = MulticlassClassificationEvaluator(labelCol="indexedLabel", predictionCol="prediction", metricName="accuracy") accuracy = evaluator.evaluate(df4) print("Test Error = %g" % (1.0 - accuracy)) azure.common安装依赖项时,听起来像只有setup.py个子目录安装到您的环境中。我试图重现此问题,但是失败了,该软件包中的所有文件都已安装。

这是我在本地Windows计算机上执行的步骤,如下所示。

  1. 创建目录install_requires=['azure-common'],并创建虚拟环境mkdir setuptmp,然后创建virtualenv setuptmp
  2. 使用以下内容创建一个cd setuptmp文件。\

    setup.py
  3. 通过from setuptools import setup, find_packages setup( name = "setuptmp", install_requires = ['azure-common'] ) 激活虚拟环境。

  4. 运行Scripts\activate.bat以安装我的python setup.py install中描述的依赖项。
  5. 运行setup.py打开REPL解释器以测试您所说的所有软件包,

    python

注意:(setuptmp) D:\projects\setuptmp>python Python 3.7.1 (v3.7.1:260ec2c36a, Oct 20 2018, 14:57:15) [MSC v.1915 64 bit (AMD64)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> import azure.common >>> import azure.profiles >>> azure.common.__file__ 'D:\\projects\\setuptmp\\lib\\site-packages\\azure_common-1.1.16-py3.7.egg\\azure\\common\\__init__.py' >>> azure.profiles.__file__ 'D:\\projects\\setuptmp\\lib\\site-packages\\azure_common-1.1.16-py3.7.egg\\azure\\profiles\\__init__.py' >>> import azure_common Traceback (most recent call last): File "<stdin>", line 1, in <module> ModuleNotFoundError: No module named 'azure_common' 不是一个模块,只是一个egg信息目录。

  1. 通过以下方式检查通过azure_commoncd Lib\site-packagesdir在我的环境中安装的软件包。

    tree azure_common-1.1.16-py3.7.egg /F
  2. 将以上内容与从Pypi网站的link下载的(setuptmp) D:\projects\setuptmp\Lib\site-packages>dir Volume in drive D is Data Volume Serial Number is BA4B-64AA Directory of D:\projects\setuptmp\Lib\site-packages 2018/12/26 14:48 <DIR> . 2018/12/26 14:48 <DIR> .. 2018/12/26 14:48 <DIR> azure_common-1.1.16-py3.7.egg 2018/12/26 14:48 61 easy-install.pth 2018/12/26 14:46 126 easy_install.py 2018/12/26 14:46 <DIR> pip 2018/12/26 14:46 <DIR> pip-18.1.dist-info 2018/12/26 14:46 <DIR> pkg_resources 2018/12/26 14:48 965 setuptmp-0.0.0-py3.7.egg 2018/12/26 14:46 <DIR> setuptools 2018/12/26 14:46 <DIR> setuptools-40.6.3.dist-info 2018/12/26 14:46 <DIR> wheel 2018/12/26 14:46 <DIR> wheel-0.32.3.dist-info 2018/12/26 14:46 <DIR> __pycache__ 3 File(s) 1,152 bytes 11 Dir(s) 80,896,319,488 bytes free (setuptmp) D:\projects\setuptmp\Lib\site-packages>tree azure_common-1.1.16-py3.7.egg /F Folder PATH listing for volume Data Volume serial number is BA4B-64AA D:\PROJECTS\SETUPTMP\LIB\SITE-PACKAGES\AZURE_COMMON-1.1.16-PY3.7.EGG ├─azure │ ├─common │ │ │ client_factory.py │ │ │ cloud.py │ │ │ credentials.py │ │ │ exceptions.py │ │ │ _version.py │ │ │ __init__.py │ │ │ │ │ └─__pycache__ │ │ _version.cpython-37.pyc │ │ __init__.cpython-37.pyc │ │ │ └─profiles │ multiapiclient.py │ __init__.py │ └─EGG-INFO PKG-INFO RECORD requires.txt top_level.txt WHEEL 软件包的文件结构进行比较。我使用azure-common解压缩了azure_common-1.1.16-py2.py3-none-any.whl文件到临时目录并7-Zip

    tree

然后,您会发现步骤D:\tmp>tree azure_common-1.1.16-py2.py3-none-any /F Folder PATH listing for volume Data Volume serial number is BA4B-64AA D:\tmp\AZURE_COMMON-1.1.16-PY2.PY3-NONE-ANY ├─azure │ ├─common │ │ client_factory.py │ │ cloud.py │ │ credentials.py │ │ exceptions.py │ │ _version.py │ │ __init__.py │ │ │ └─profiles │ multiapiclient.py │ __init__.py │ └─azure_common-1.1.16.dist-info METADATA RECORD top_level.txt WHEEL 6的文件结构几乎相同。

希望有帮助。如果您有任何疑问,请随时告诉我。


我在Linux上进行了相同的操作,并得到了相同的结果。我在运行7之前和之后保存了Linux tree lib/ > lib_[before|after].txt的{​​{1}}的输出,然后使用setuptmp进行比较,如下所示。

python setup.py install