Question

我有一个数据集（在CSV文件上），我想在其上使用Python进行梯度下降我不知道如何读取此代码上的数据集。

我想知道此数据集的准确性并将其可视化。

我尝试了很多，但是没有答案..代码正在正确运行，但是如何使代码能够读取我的数据集。

import matplotlib.pyplot as plt
from sklearn.datasets.samples_generator import make_blobs
import numpy as np
import argparse


def sigmoid_activation(x):
    return 1.0 / (1 + np.exp(-x))


ap = argparse.ArgumentParser()
ap.add_argument("-e", "--epochs", type=float, default=100,
                help="# of epochs")
ap.add_argument("-a", "--alpha", type=float, default=0.01,
                help="learning rate")
args = vars(ap.parse_args())

(X, y) = make_blobs(n_samples=250, n_features=2, centers=2, cluster_std=1.05, random_state=20)
X = np.c_[np.ones((X.shape[0])), X]
print("[INFO] starting training...")
W = np.random.uniform(size=(X.shape[1],))
lossHistory = []

for data in np.arange(0, args["epochs"]):
    preds = sigmoid_activation(X.dot(W))
    error = preds - y
    loss = np.sum(error ** 2)
    lossHistory.append(loss)
    print("[INFO] epoch #{}, loss={:.7f}".format(data + 1, loss))

gradient = X.T.dot(error) / X.shape[0]
W += -args["alpha"] * gradient

for i in np.random.choice(250, 10):
    activation = sigmoid_activation(X[i].dot(W))
    label = 0 if activation < 0.5 else 1
    print("activation={:.4f}; predicted_label={}, true_label={}".format(activation, label, y[i]))

Y = (-W[0] - (W[1] * X)) / W[2]

plt.figure()
plt.scatter(X[:, 1], X[:, 2], marker="o", c=y)
plt.plot(X, Y, "r-")

fig = plt.figure()
plt.plot(np.arange(0, args["epochs"]), lossHistory)
fig.suptitle("Training Loss")
plt.xlabel("Epoch #")
plt.ylabel("Loss")
plt.show()

Answer 1

您可以使用以下方法读取CSV文件（例如，名为foo.csv）：

import numpy as np

data = np.loadtxt('foo.csv', delimiter=",")
X = data[:,:2]
y = data[:, 2]

上面的代码在前两列中假设有2个功能，在第三列中假设y。

在数据集上应用梯度下降

1 个答案: