For example consider this dataset:
(1) https://archive.ics.uci.edu/ml/machine-learning-databases/annealing/anneal.data
Or
(2) http://data.worldbank.org/topic
How does one call such external datasets into scikit-learn to do anything with it?
The only kind of dataset calling that I have seen in scikit-learn is through a command like:
from sklearn.datasets import load_digits
digits = load_digits()
答案 0 :(得分:1)
You need to learn a little pandas, which is a data frame implementation in python. Then you can do
import pandas
my_data_frame = pandas.read_csv("/path/to/my/data")
To create model matrices from your data frame, I recommend the patsy library, which implements a model specification language, similar to R
formulas
import patsy
model_frame = patsy.dmatrix("my_response ~ my_model_fomula", my_data_frame)
then the model frame can be passed in as an X
into the various sklearn models.
答案 1 :(得分:0)
只需运行以下命令并将名称“EXTERNALDATASETNAME”替换为您的数据集名称
import sklearn.datasets
data = sklearn.datasets.fetch_EXTERNALDATASETNAME()