Question

# Python code to demonstrate SQL to fetch data.

# importing the module
import sqlite3
import numpy as np
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt
from scipy.stats import chisquare

# connect withe the myTable database
connection = sqlite3.connect(r"C:\Users\Aidan\Desktop\INA_DB.db")

# cursor object
crsr = connection.cursor()


dog= crsr.execute("Select s, ei, ki FROM INa_VC WHERE s IN ('d') ")
ans= crsr.fetchall() 

#x = [0]*len(ans); y = [0]*len(ans)
x= np.zeros(len(ans)); y= np.zeros(len(ans))

for i in range(0,len(ans)):
    x[i] = float(ans[i][1])
    y[i] = float(ans[i][2])


# Reshaping
x, y = x.reshape(-1,1), y.reshape(-1, 1)

# Linear Regression Object 
lin_regression = LinearRegression()

# Fitting linear model to the data
lin_regression.fit(x,y)

# Get slope of fitted line
m = lin_regression.coef_

# Get y-Intercept of the Line
b = lin_regression.intercept_

# Get Predictions for original x values
# you can also get predictions for new data
predictions = lin_regression.predict(x)
chi= chisquare(predictions, y)

# following slope intercept form 
print ("formula: y = {0}x + {1}".format(m, b)) 
print(chi)


plt.scatter(x, y,  color='black')
plt.plot(x, predictions, color='blue',linewidth=3)
plt.show()

错误：

runfile（'C：/Users/Aidan/.spyder-py3/temp.py'，   wdir ='C：/Users/Aidan/.spyder-py3'）

回溯（最近通话最近一次）：

文件“”，位于
的第1行
runfile（'C：/Users/Aidan/.spyder-py3/temp.py'，   wdir ='C：/Users/Aidan/.spyder-py3'）

文件   “ C：\ Users \ Aidan \ Anaconda3 \ lib \ site-packages \ spyder \ utils \ site \ sitecustomize.py”，   行文件中的第705行

execfile（文件名，名称空间）

文件   “ C：\ Users \ Aidan \ Anaconda3 \ lib \ site-packages \ spyder \ utils \ site \ sitecustomize.py”，   execfile中的第102行

exec（compile（f.read（），文件名，'exec'），命名空间）

文件
中的文件“ C：/Users/Aidan/.spyder-py3/temp.py”，第28行
y [i] = float（ans [i] [2]）

ValueError：无法将字符串转换为浮点数：

我有99％的把握是Y值的问题。对于我的数据集，我故意丢失了一些y值，这导致浮动错误。给定我当前的脚本，为了过滤掉缺少的NAN y值，什么是快速解决方案？

如果其中包含y值，则此脚本可以完美运行。

Answer 1

最好的答案可能是将这些值作为字符串"nan"存储在数据库中，float可以很好地进行解析。之后，您可以使用例如np.isnan来获取那些未定义的值。

或者，将它们保留为零：

for i in range(0, len(ans)):
    try:
        x[i] = float(ans[i][1])
    except ValueError:
        pass
    try:
        y[i] = float(ans[i][2])
    except ValueError:
        pass

或者，将它们完全排除在外：

xy = np.array([tuple(map(float, values[1:])) for values in ans if values[2]])
x = xy[:, 0]
y = xy[:, 1]

消除缺少“ Y”值或NAN

1 个答案: