Suppose this wheel:
M Filemode Length Date Time File
- ---------- -------- ----------- -------- -------------------------------------------
-rw-rw-r-- 1358 26-Sep-2018 21:08:40 azure/common/__init__.py
-rw-rw-r-- 327 26-Sep-2018 21:08:40 azure/common/_version.py
-rw-rw-r-- 8737 26-Sep-2018 21:08:40 azure/common/client_factory.py
-rw-rw-r-- 755 26-Sep-2018 21:08:40 azure/common/cloud.py
-rw-rw-r-- 2479 26-Sep-2018 21:08:40 azure/common/credentials.py
-rw-rw-r-- 805 26-Sep-2018 21:08:40 azure/common/exceptions.py
-rw-rw-r-- 6079 26-Sep-2018 21:08:40 azure/profiles/__init__.py
-rw-rw-r-- 3943 26-Sep-2018 21:08:40 azure/profiles/multiapiclient.py
-rw-rw-r-- 6 26-Sep-2018 21:21:54 azure_common-1.1.16.dist-info/top_level.txt
-rw-rw-r-- 110 26-Sep-2018 21:21:54 azure_common-1.1.16.dist-info/WHEEL
-rw-rw-r-- 3805 26-Sep-2018 21:21:54 azure_common-1.1.16.dist-info/METADATA
-rw-rw-r-- 997 26-Sep-2018 21:21:54 azure_common-1.1.16.dist-info/RECORD
- ---------- -------- ----------- -------- -------------------------------------------
29401 12 files
It has three different packages in it:
All great names, and great layout. Also, a lot of greatness of mind that unmistakably went into engineering this miracle of modern software engineering.
This wheel is distributed by the name azure-common
. So, when you depend on in in setup.py
like this:
setup(
...
install_requires=['azure-common'],
...
)
You will only get azure_common
package installed. Maybe. I don't really know, it seems so, but few times that I tried it seemed to only install azure.common
, or maybe I eyeballed it... It's really hard to follow all the manipulations setuptools does on a package.
Hence the question: how can I force setuptools
into installing all packages found in this kind of wheel? Also, the order is important because this garbage needs to be installed some of the times with other packages which also provide azure.something
packages which may overwrite the stuff in azure
directory. So, Ideally, I'd also like to control in which order install_requires
dependencies are processed.
This is where this started: How to specify bracket dependencies in setup.py?
答案 0 :(得分:0)
当通过#import statements
from pyspark.ml import Pipeline
from pyspark.ml.classification import RandomForestClassifier
from pyspark.ml.feature import IndexToString, StringIndexer, VectorIndexer
from pyspark.ml.evaluation import MulticlassClassificationEvaluator
from pyspark.sql.functions import udf
from pyspark.ml.linalg import Vectors
from pyspark.ml.linalg import VectorUDT
from pyspark.context import SparkContext
from pyspark.sql.session import SparkSession
#create spark session
sc = SparkContext('local')
spark = SparkSession(sc)
#read Dataframe
df=spark.read.parquet('JouID-UBTFIDFVectors-server22.parquet')
print("DATAFRAME READ DONE")
# index journal id field using StringIndexer to create label column on which random forest can be trained
labelIndexer = StringIndexer(inputCol="journalid",
outputCol="indexedLabel",handleInvalid="skip")
labelsDF=labelIndexer.fit(df)
df1=labelsDF.transform(df)
print("LABEL INDEXING DONE")
#Convert sparse feature vector into dense vector
parse_ = udf(lambda l: Vectors.dense(l), VectorUDT())
df2 = df1.withColumn("featuresNew",parse_(df1["features"])).drop('features')
#spilt train test
(trainingData, testData) = df2.randomSplit([0.8, 0.2])
#Define random forest properties
rf = RandomForestClassifier(labelCol="indexedLabel", featuresCol="featuresNew", numTrees=10)
print("RF Training Strated")
rfModel=rf.fit(trainingData)
print("RF Training DONE")
rfModel.save('rfModel-10Trees-UnigramBigramTFIDF-21Dec')
df4=rfModel.transform(testData)
print("RF Testing DONE")
#convert indexed label back to journalid
labelConverter = IndexToString(inputCol="prediction", outputCol="predictedLabel",labels=labelsDF.labels)
df5=labelConverter.transform(df4)
df5.select('journalid','predictedLabel').write.csv('JouID-PredictedLable-rf-21Dec.csv')
evaluator = MulticlassClassificationEvaluator(labelCol="indexedLabel", predictionCol="prediction", metricName="accuracy")
accuracy = evaluator.evaluate(df4)
print("Test Error = %g" % (1.0 - accuracy))
和azure.common
安装依赖项时,听起来像只有setup.py
个子目录安装到您的环境中。我试图重现此问题,但是失败了,该软件包中的所有文件都已安装。
这是我在本地Windows计算机上执行的步骤,如下所示。
install_requires=['azure-common']
,并创建虚拟环境mkdir setuptmp
,然后创建virtualenv setuptmp
。使用以下内容创建一个cd setuptmp
文件。\
setup.py
通过from setuptools import setup, find_packages
setup(
name = "setuptmp",
install_requires = ['azure-common']
)
激活虚拟环境。
Scripts\activate.bat
以安装我的python setup.py install
中描述的依赖项。运行setup.py
打开REPL解释器以测试您所说的所有软件包,
python
注意:(setuptmp) D:\projects\setuptmp>python
Python 3.7.1 (v3.7.1:260ec2c36a, Oct 20 2018, 14:57:15) [MSC v.1915 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import azure.common
>>> import azure.profiles
>>> azure.common.__file__
'D:\\projects\\setuptmp\\lib\\site-packages\\azure_common-1.1.16-py3.7.egg\\azure\\common\\__init__.py'
>>> azure.profiles.__file__
'D:\\projects\\setuptmp\\lib\\site-packages\\azure_common-1.1.16-py3.7.egg\\azure\\profiles\\__init__.py'
>>> import azure_common
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'azure_common'
不是一个模块,只是一个egg信息目录。
通过以下方式检查通过azure_common
,cd Lib\site-packages
和dir
在我的环境中安装的软件包。
tree azure_common-1.1.16-py3.7.egg /F
将以上内容与从Pypi网站的link下载的(setuptmp) D:\projects\setuptmp\Lib\site-packages>dir
Volume in drive D is Data
Volume Serial Number is BA4B-64AA
Directory of D:\projects\setuptmp\Lib\site-packages
2018/12/26 14:48 <DIR> .
2018/12/26 14:48 <DIR> ..
2018/12/26 14:48 <DIR> azure_common-1.1.16-py3.7.egg
2018/12/26 14:48 61 easy-install.pth
2018/12/26 14:46 126 easy_install.py
2018/12/26 14:46 <DIR> pip
2018/12/26 14:46 <DIR> pip-18.1.dist-info
2018/12/26 14:46 <DIR> pkg_resources
2018/12/26 14:48 965 setuptmp-0.0.0-py3.7.egg
2018/12/26 14:46 <DIR> setuptools
2018/12/26 14:46 <DIR> setuptools-40.6.3.dist-info
2018/12/26 14:46 <DIR> wheel
2018/12/26 14:46 <DIR> wheel-0.32.3.dist-info
2018/12/26 14:46 <DIR> __pycache__
3 File(s) 1,152 bytes
11 Dir(s) 80,896,319,488 bytes free
(setuptmp) D:\projects\setuptmp\Lib\site-packages>tree azure_common-1.1.16-py3.7.egg /F
Folder PATH listing for volume Data
Volume serial number is BA4B-64AA
D:\PROJECTS\SETUPTMP\LIB\SITE-PACKAGES\AZURE_COMMON-1.1.16-PY3.7.EGG
├─azure
│ ├─common
│ │ │ client_factory.py
│ │ │ cloud.py
│ │ │ credentials.py
│ │ │ exceptions.py
│ │ │ _version.py
│ │ │ __init__.py
│ │ │
│ │ └─__pycache__
│ │ _version.cpython-37.pyc
│ │ __init__.cpython-37.pyc
│ │
│ └─profiles
│ multiapiclient.py
│ __init__.py
│
└─EGG-INFO
PKG-INFO
RECORD
requires.txt
top_level.txt
WHEEL
软件包的文件结构进行比较。我使用azure-common
解压缩了azure_common-1.1.16-py2.py3-none-any.whl
文件到临时目录并7-Zip
。
tree
然后,您会发现步骤D:\tmp>tree azure_common-1.1.16-py2.py3-none-any /F
Folder PATH listing for volume Data
Volume serial number is BA4B-64AA
D:\tmp\AZURE_COMMON-1.1.16-PY2.PY3-NONE-ANY
├─azure
│ ├─common
│ │ client_factory.py
│ │ cloud.py
│ │ credentials.py
│ │ exceptions.py
│ │ _version.py
│ │ __init__.py
│ │
│ └─profiles
│ multiapiclient.py
│ __init__.py
│
└─azure_common-1.1.16.dist-info
METADATA
RECORD
top_level.txt
WHEEL
和6
的文件结构几乎相同。
希望有帮助。如果您有任何疑问,请随时告诉我。
我在Linux上进行了相同的操作,并得到了相同的结果。我在运行7
之前和之后保存了Linux tree lib/ > lib_[before|after].txt
的{{1}}的输出,然后使用setuptmp
进行比较,如下所示。
python setup.py install