Question

使用x，y坐标的散点图建议Matplotlib中的图与使用其他程序获得的图不同。例如，以下是两个合适分数的某些PCA的结果。使用R和相同数据的相同图形提供了不同的显示...我还检查了Excell和Libreoffice：它们提供了与R相同的显示。在对抗Matplotlib咆哮或报告错误之前，我想得到其他意见，并检查我是否做得好。我的缺点是什么？

我检查浮动不是问题，检查坐标顺序类似，... 所以用R：

绘图

mydata = read.csv("C:/Users/Anon/Desktop/data.txt")  # read csv file
summary(mydata)
attach(mydata) 
plot(mydata)

R的散点图 enter image description here

使用Matplotlib绘制相同的数据：

import matplotlib.pyplot as mpl
import numpy as np
import os
# open the file with PCA results and convert it into float
file_data = os.getcwd() + "\\data.txt"
F = open(file_data, 'r')
DATA=F.readlines()
F.close()
for x in range(len(DATA)) :
    a = DATA[x]
    b = a.split(',')
    DATA[x] = b
for i in xrange(len(DATA)):
    for j in xrange(len(DATA[i])):
        DATA[i][j] = float(DATA[i][j])
print DATA[0]
X_train = np.mat(DATA)
print "X_train\n",X_train

mpl.scatter(X_train[:, 0], X_train[:, 1], c='white')
mpl.show()

scatter plot made by Matplotlib 和打印X_train的结果（所以你可以验证数据是否相同） enter image description here 使用Excell：

数据:(我无法输入所有数据，请告诉我如何加入* .txt文件~40.5 Ko）

0.02753547770433    -0.037999362802379
0.05179194064903    0.0257492713593311
-0.0272928319004863 0.0065143681863637
0.0891355504379135  -0.00801696955147688
0.0946809371499167  -0.00502202338807476
-0.0445799941736001 -0.0435759273767196
-0.333617999778119  -0.204222004815357
-0.127212025425053  -0.110264460064754
-0.0243459270896855 -0.0622273166478512
0.0497080821876597  0.0272080474151131
-0.181221703468915  -0.134945934382777
-0.0699503258694739 -0.0835239795690277

编辑：所以我还将PCA数据（从scipy）导出到文本文件中，并使用python / matplotlib和R打开这个常用文本文件，以避免一些与PCA相关的prblms。在处理之后制作图（并且在PCA看起来像圆顶之前的图表）

edit2：使用numpy.loadtxt（），它显示为R，但我的自定义方法和numpy.loadtxt（）提供了相同的数据形状，大小，类型和值，那么涉及的机制是什么？

X_train numpy.loadtxt()
[[ 0.02753548 -0.03799936]
 [ 0.05179194  0.02574927]
 [-0.02729283  0.00651437]
 ..., 
 [ 0.02670961 -0.00696177]
 [ 0.09011859 -0.00661216]
 [-0.04406559  0.09285291]] 
shape and size
(1039L, 2L) 2078

X_train custom-method
[[ 0.02753548 -0.03799936]
 [ 0.05179194  0.02574927]
 [-0.02729283  0.00651437]
 ..., 
 [ 0.02670961 -0.00696177]
 [ 0.09011859 -0.00661216]
 [-0.04406559  0.09285291]] 
shape and size
(1039L, 2L) 2078

Answer 1

问题在于您将X_train表示为矩阵而不是二维数组。这意味着当您使用X_train[:, 0]对其进行子集化时，您没有获得一维数组 - 您将获得一个包含一列的矩阵（然后matplotlib将尝试分散）。您可以通过打印X_train[:, 0]。*

自行查看

您只需更改以下行即可解决问题：

X_train = np.mat(DATA)

到

X_train = np.array(DATA)

*例如，对于您发布的数据，X_train[:, 0]为：

[[ 0.02753548]
 [ 0.05179194]
 [-0.02729283]
 [ 0.08913555]
 [ 0.09468094]
 [-0.04457999]
 [-0.333618  ]
 [-0.12721203]
 [-0.02434593]
 [ 0.04970808]
 [-0.1812217 ]
 [-0.06995033]]

Answer 2

在我看来，问题在于读取数组的代码。你得到了错误的维度。请尝试使用numpy.loadtxt。 http://docs.scipy.org/doc/numpy/reference/generated/numpy.loadtxt.html

绘制Matplotlib中的缺陷

2 个答案: