Question

我正在努力弄清楚为什么我会收到这个错误。我已经搜索了很多但无法解决任何问题

print (df.loc[0].to_frame())
   0
A  2
B  a
C  a

print (df.loc[0].to_frame().T)
   A  B  C
0  2  a  a

跑完之后。它给出了错误

import numpy as np
import warnings
from collections import Counter
import pandas as pd

def k_nearest_neighbors(data, predict, k=3):
if len(data) >= k:
    warnings.warn('K is set to a value less than total voting groups!')
distances = []
for group in data:
    for features in data[group]:
        euclidean_distance = np.linalg.norm(np.array(features)-
np.array(predict))
        distances.append([euclidean_distance,group])
votes = [i[1] for i in sorted(distances)[:k]]
vote_result = Counter(votes).most_common(1)[0][0]
return vote_result

df = pd.read_csv("data.txt")
df.replace('?',-99999, inplace=True)
df.drop(['id'], 1, inplace=True)
full_data = df.astype(float).values.tolist()

print(full_data)

如果我删除Traceback (most recent call last): File "E:\Jazab\Machine Learning\Lec18(Testing K Neatest Nerighbors Classifier)\Lec18(Testing K Neatest Nerighbors Classifier)\Lec18_Testing_K_Neatest_Nerighbors_Classifier_.py", line 25, in <module> full_data = df.astype(float).values.tolist() File "C:\Python27\lib\site-packages\pandas\util\_decorators.py", line 91, in wrapper return func(*args, **kwargs) File "C:\Python27\lib\site-packages\pandas\core\generic.py", line 3299, in astype **kwargs) File "C:\Python27\lib\site-packages\pandas\core\internals.py", line 3224, in astype return self.apply('astype', dtype=dtype, **kwargs) File "C:\Python27\lib\site-packages\pandas\core\internals.py", line 3091, in apply applied = getattr(b, f)(**kwargs) File "C:\Python27\lib\site-packages\pandas\core\internals.py", line 471, in astype **kwargs) File "C:\Python27\lib\site-packages\pandas\core\internals.py", line 521, in _astype values = astype_nansafe(values.ravel(), dtype, copy=True) File "C:\Python27\lib\site-packages\pandas\core\dtypes\cast.py", line 636, in astype_nansafe return arr.astype(dtype) ValueError: invalid literal for float(): 3) <-----Reappears in Group 8 as: Press any key to continue . . .程序运行正常我该怎么办？

Answer 1

有错误数据（3)），因此to_numeric需要apply，因为需要处理所有列。

非数字转换为NaN s，由fillna替换为某个标量，例如0：

full_data = df.apply(pd.to_numeric, errors='coerce').fillna(0).values.tolist()

样品：

df = pd.DataFrame({'A':[1,2,7], 'B':['3)',4,5]})
print (df)
   A   B
0  1  3)
1  2   4
2  7   5

full_data = df.apply(pd.to_numeric, errors='coerce').fillna(0).values.tolist()
print (full_data)
[[1.0, 0.0], [2.0, 4.0], [7.0, 5.0]]

Answer 2

您的CSV文件中的条目似乎有3)，Pandas抱怨因为)而无法将其转换为浮点数。

在k最近邻居中浮点的文字无效

2 个答案: