给出以下行:
plt.scatter(X[:, 0], X[:, 1], s=50);
X[:, 0], X[:, 1]
是什么意思?在我浏览过的所有示例中,我只看到X,y
。
我也不了解X, y =
的目的。
下面是X
的输出,其中包括X
和y
的值。但是y
本身有不同的输出,我不知道在哪里使用它/为什么?
array([[ 1.85219907, 1.10411295],
[-1.27582283, 7.76448722],
[ 1.0060939 , 4.43642592],
[-1.20998253, 7.83203579],
[ 1.92461484, 1.06347673],
[ 2.28565919, 0.79166208],
[-1.57379043, 2.69773813],
[ 1.04917913, 4.31668562],
[-1.07436851, 7.93489945],
[-1.15872975, 7.97295642]
下面的完整脚本:
#import the required libraries
# - matplotlib is a charting library
# - Seaborn builds on top of Matplotlib and introduces additional plot types. It also makes your traditional Matplotlib plots look a bit prettier.
# - Numpy is numerical Python
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
from sklearn.datasets.samples_generator import make_blobs
from sklearn.cluster import KMeans
#Generate sample data, with distinct clusters for testing
#n_samples = the number of datapoints, equally split across each clusters
#centers = The number of centers to generate (number of clusters) - a center is the arithmetic mean of all the points belonging to the cluster.
#cluster_std = the standard deviation of the clusters - a quantity expressing by how much the members of a group differ from the mean value for the group (how tight is the cluster going to be)
#random_state = controls the random number generator being used. If you don't mention the random_state in the code, then whenever you execute your code a new random value is generated and the train and test datasets would have different values each time. However, if you use a particular value for random_state(random_state = 1 or any other value) everytime the result will be same,i.e, same values in train and test datasets.
X, y = make_blobs(n_samples=300, centers=4,
cluster_std=0.50, random_state=0)
#The below statement, will enable us to visualise matplotlib charts, even in ipython
#Using matplotlib backend: MacOSX
#Populating the interactive namespace from numpy and matplotlib
%pylab
#plot the chart
#s = the sizer of the points.
plt.scatter(X[:, 0], X[:, 1], s=50);
答案 0 :(得分:1)
make_blobs生成“各向同性的高斯斑点”-X是一个具有两列的numpy数组,其中包含这些点的(x,y)高斯坐标,而y包含每个点的类别列表。
In[1]: X.shape
Out[1]: (300, 2)
X [:, 0]是选择列0的每个行条目的numpy坐标方式-即从numpy数组中的单个列。
如果绘制坐标簇,则可以更轻松地看到它们。您的代码似乎丢失了
plt.show()
将显示绘图。 make_blob plot
如果针对y绘制这些列之一,则可以更清楚地看到它们是根据其坐标进行分类的,但这本身并不是特别有用的图。 X[:, 0] plotted against y
答案 1 :(得分:1)
X
是2D numpy数组。 X[:,0]
正在访问第一列中的所有内容,而X[:,1]
正在访问第二列中的所有内容。
对于您的plt.scatter
语句,图表的“ x”和“ y”均来自X
。
X, y =
仅表示make_blobs()
的输出具有两个元素,分别分配给X
和y
。由于分配给变量的名称,散点图中与“ x”和“ y”的关联有些混乱。 “ x”和“ y”可以是任何变量,或者(在这种情况下)可以与单个2D numpy数组分开索引。