c1, c2, c3, c4, c5, ID seq
2020 2020 2020 2020 2020 1212 1
2021 2020 2021 2020 2021 1212 2
2022 2020 2022 2020 2022 1212 3
2023 2020 2023 2020 2023 1313 1
2024 2020 2024 2020 2024 1313 2
2025 2020 2025 2020 2025 1313 3
2026 2020 2026 2020 2026 1313 4
2026 2020 2026 2020 2026 1313 5
正在导入的数据:
# Python code to demonstrate SQL to fetch data.
# importing the module
import sqlite3
import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt
from scipy.stats import chisquare
# connect withe the myTable database
connection = sqlite3.connect(r"C:\Users\Aidan\Desktop\cep_db.db")
# cursor object
crsr = connection.cursor()
dog= crsr.execute("Select s, ei, ki FROM cep_db_lite1_vc WHERE s IN ('d')")
ans= crsr.fetchall()
# loop to print all the data
dogData = np.array(ans)
FdogData = dogData[:, 1:]
FdogData.astype(float)
x, y =FdogData[:,0], FdogData[:,1]
# Reshaping
x, y = x.reshape(-1,1), y.reshape(-1, 1)
# Linear Regression Object
lin_regression = LinearRegression()
# Fitting linear model to the data
lin_regression.fit(x,y)
# Get slope of fitted line
m = lin_regression.coef_
# Get y-Intercept of the Line
b = lin_regression.intercept_
# Get Predictions for original x values
# you can also get predictions for new data
predictions = lin_regression.predict(x)
chi= chisquare(predictions, y)
# following slope intercept form
print ("formula: y = {0}x + {1}".format(m, b))
print(chi)
# Plot the Original Model (Black) and Predictions (Blue)
plt.scatter(x, y, color='black')
plt.plot(x, predictions, color='blue',linewidth=3)
plt.show()
应该很容易解决,但我一点儿也不懂。
获取错误:
('d', '-72.70', '3.20')
('d', '-74.81', '2.00')
('d', '-87.60', '5.50')
('d', '-91.38', '2.00')
('d', '-71.80', '2.00')
('d', '-73.10', '2.00')
('d', '-81.20', '2.00')
('d', '-81.40', '2.00')
('d', '-75.70', '5.70')
('d', '-83.50', '5.10')
('d', '-73.90', '2.00')
('d', '-82.60', '2.00')
('d', '-77.30', '2.00')
('d', '-85.10', '2.00')
('d', '-79.70', '2.00')
('d', '-78.70', '2.00')
('d', '-77.90', '2.00')
('d', '-76.80', '2.00')
('d', '-83.80', '2.00')
('d', '-83.90', '2.00')
('d', '-82.00', '4.90')
('d', '-80.00', '4.80')
据我所知,脚本无法将字母“ d”隐藏为浮点,因为它是字母而不是数字。
导入后如何忽略数据中的第一列?确定已对它进行切片。我只希望能够创建一个具有第2列和第3列的数组并将其用于数据分析/绘图