我正在使用Python 3.6而且我是新手,所以请提前感谢您的耐心等待。
我有一个函数可以总结3点之间的差异。然后它应该采用“差异”并将它们与另一个名为标签的DataFrame连接起来。 k和长度是整数。我希望生成的DataFrame有两列但只有一列。
示例代码:
def distance(df1,df2,labels,k,length):
total_dist = 0
for i in range(length):
dist_dif = df1.iloc[:,i] - df2.iloc[:,i]
sq_dist = dist_dif ** 2
root_dist = sq_dist ** 0.5
total_dist = total_dist + root_dist
return total_dist
distance_df = pd.concat([total_dist, labels], axis=1)
distance_df.sort(ascending=False, axis=1, inplace=True)
top_knn = distance_df[:k]
return top_knn.value_counts().index.values[0]
示例数据:
d1 = {'Z_Norm_Age': [1.20, 2.58,2.54], 'Pclass': [3, 3, 2], 'Conv_Sex': [0, 1, 0]}
d2 = {'Z_Norm_Age': [-0.51, 0.24,0.67], 'Pclass': [3, 1, 3], 'Conv_Sex': [0, 1, 1]}
lbl = {'Survived': [0, 1,1]}
df1 = pd.DataFrame(data=d1)
df2 = pd.DataFrame(data=d2)
labels = pd.DataFrame(data=lbl)
我希望数据看起来像这样:
total_dist labels
0 1.715349 0
1 2.872991 1
2 4.344087 1
但它看起来像这样:
0 1.715349
1 4.344087
2 2.872991
dtype: float64
输出不执行以下操作: 1.返回标签列数据 2.按降序对数据进行排序
如果有人能指出我正确的方向,我真的很感激。
答案 0 :(得分:0)
给定两个DataFrame
s,df1-df2
将执行减法元素。使用abs()
获取该差异的绝对值,最后对每一行求和。这是对以下函数中第一个命令的解释。其他行与您的代码类似。
import numpy as np
import pandas as pd
def calc_abs_distance_between_rows_then_add_labels_and_sort(df1, df2, labels):
diff = np.sum(np.abs(df1-df2), axis=1) # np.sum(..., axis=1) sums the rows
diff.name = 'total_abs_distance' # Not really necessary, but just to refer to it later
diff = pd.concat([diff, labels], axis=1)
diff.sort_values(by='total_abs_distance', axis=0, ascending=True, inplace=True)
return diff
因此,对于您的示例数据:
d1 = {'Z_Norm_Age': [1.20, 2.58,2.54], 'Pclass': [3, 3, 2], 'Conv_Sex': [0, 1, 0]}
d2 = {'Z_Norm_Age': [-0.51, 0.24,0.67], 'Pclass': [3, 1, 3], 'Conv_Sex': [0, 1, 1]}
lbl = {'Survived': ['a', 'b', 'c']}
df1 = pd.DataFrame(data=d1)
df2 = pd.DataFrame(data=d2)
labels = pd.DataFrame(data=lbl)
calc_abs_distance_between_rows_then_add_labels_and_sort(df1, df2, labels)
我们希望得到你想要的东西:
total_abs_distance Survived
0 1.71 a
2 3.87 c
1 4.34 b
一些注意事项:
np.sqrt(np.sum(np.square(df1-df2),axis=1))
替换上面那个函数中的第一个命令。index
的{{1}}代替。也许它更适合你的目的?例如:DataFrame