我正在尝试对csv中的数据进行一些基本分析。数据具有时间戳,在该时间戳上有“测试A”和“测试B”的值。 [csv file data sample]
我获得了测试A和测试B的平均值,以及测试结果之间的差异。但是我确实需要计算r ^ 2的值,以查看两个测试之间的关系。我知道在excel中执行此操作非常简单,但是我有很多数据,因此需要对其进行最佳编码。我必须计算r ^ 2的代码部分返回错误
TypeError:**或pow()不支持的操作数类型:“ LinregressResult”和“ int”
我想知道是否可能是因为我正在处理float64格式的列数据? [TypeError message]
理想情况下,我也在寻找一种仅分析数据部分的方法-我想每小时分析一次数据(每小时45个数据点)。任何人都只能包含行的特定部分吗?
非常感谢!
import pandas as pd
from scipy import stats
# Read the file in csv
data_input = pd.read_csv("StackOF_r2.csv", low_memory=False)
#Output the number of rows
print("Total rows: {0}".format(len(data_input)))
# See which headers are available
print(list(data_input))
# Get the data from the data columns
data_A = data_input['Test A']
data_B = data_input['Test B']
# Average the data for Test A
Test_A = data_input['Test A'].mean()
print 'Test A Average: ', round(Test_A, 4)
# Average the data for Test B
Test_B = data_input['Test B'].mean()
print 'Test B Average: ', round(Test_B, 4)
# Calculate difference to between tests
Error = Test_A - Test_B
print 'Error (difference between averages): ', round(abs(Error), 4)
# Work out the r2 value between the two tests
r_value = stats.linregress(data_A, data_B)
print "r_value: ", r_value
print "R-squared: ", r_value**2
print data_input['Test A'].dtypes
答案 0 :(得分:0)
r_value
不是int
或float
的类型是LinregressResult
要访问右值,您必须执行以下两个操作之一
v = stats.linregress(x, y)
v.rvalue
或
slope, intercept, r_value, p_value, std_err = stats.linregress(x, y)
答案 1 :(得分:0)
scipy.stats.linregress
从documentation返回LinregressResult
。如果您看一下源代码,将给出一个示例。
from scipy import stats
np.random.seed(12345678)
x = np.random.random(10)
y = np.random.random(10)
slope, intercept, r_value, p_value, std_err = stats.linregress(x, y)
print("r-squared:", r_value**2)
答案 2 :(得分:0)
我已复制了您的代码段,现在它应该对您有用。您遇到的问题是linregress返回多个值,因此,即使不使用它们,也必须在等号左侧使用逗号分隔的列表来捕获所有值。
import pandas as pd
from scipy import stats
# Read the file in csv
data_input = pd.read_csv("StackOF_r2.csv", low_memory=False)
#Output the number of rows
print("Total rows: {0}".format(len(data_input)))
# See which headers are available
print(list(data_input))
# Get the data from the data columns
data_A = data_input['Test A']
data_B = data_input['Test B']
# Average the data for Test A
Test_A = data_input['Test A'].mean()
print 'Test A Average: ', round(Test_A, 4)
# Average the data for Test B
Test_B = data_input['Test B'].mean()
print 'Test B Average: ', round(Test_B, 4)
# Calculate difference to between tests
Error = Test_A - Test_B
print 'Error (difference between averages): ', round(abs(Error), 4)
# Work out the r2 value between the two tests
##### This is the correction #####
slope, intercept, r_value, p_value, std_err = stats.linregress(data_A, data_B)
print "r_value: ", r_value
print "R-squared: ", r_value**2
print data_input['Test A'].dtypes