在此处共享数据的详细信息-
数据集包含116列和30229行。数据集dtype = dataframe。 最后一列是因变量,而其他所有列都是自变量。
X,Y分别是float64和int64。 训练和测试是numpy模块的value = ndarray对象的对象。
我为欧几里得距离定义了一个数学函数 =((x1-x2)^ 2 +(y1-y2)^ 2 + ....)^ 1/2。
我在代码的最后一行面临挑战-出现错误“ IndexError:numpy数组的索引过多”。即使我只是输入train [0]或train [0] [0],错误仍然保持不变。
请帮助我解决问题。如果您需要更多详细信息,请告诉我。
代码-`
import numpy as np
import pandas as pd
import math as mt
import matplotlib.pyplot as plt
dataset = pd.read_csv('Quote_Viewed_Doc_Updated.csv')
X = dataset.iloc[:,:-1].values
y = dataset.iloc[:,115].values
#splitting into Test and Train
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size = 1/3, random_state = 0)
# 1) given two data points, calculate the euclidean distance between them
def get_distance(data1, data2):
points = zip(data1, data2)
diffs_squared_distance = [pow(a - b, 2) for (a, b) in points]
return mt.sqrt(sum(diffs_squared_distance))
# reformat train/test datasets for convenience
train = np.array(zip(X_train,y_train))
test = np.array(zip(X_test, y_test))
get_distance(train[0][0], train[1][0])
`