Python R ^ 2计算出了问题

时间:2018-11-06 11:44:22

标签: python excel pandas dataframe statistics

我正在尝试对csv中的数据进行一些基本分析。数据具有时间戳,在该时间戳上有“测试A”和“测试B”的值。 [csv file data sample]

我获得了测试A和测试B的平均值,以及测试结果之间的差异。但是我确实需要计算r ^ 2的值,以查看两个测试之间的关系。我知道在excel中执行此操作非常简单,但是我有很多数据,因此需要对其进行最佳编码。我必须计算r ^ 2的代码部分返回错误

  

TypeError:**或pow()不支持的操作数类型:“ LinregressResult”和“ int”

我想知道是否可能是因为我正在处理float64格式的列数据? [TypeError message]

理想情况下,我也在寻找一种仅分析数据部分的方法-我想每小时分析一次数据(每小时45个数据点)。任何人都只能包含行的特定部分吗?

非常感谢!

import pandas as pd
from scipy import stats

# Read the file in csv 
data_input = pd.read_csv("StackOF_r2.csv", low_memory=False)

#Output the number of rows
print("Total rows: {0}".format(len(data_input)))

# See which headers are available
print(list(data_input))

# Get the data from the data columns
data_A = data_input['Test A']
data_B = data_input['Test B']

# Average the data for Test A
Test_A = data_input['Test A'].mean()
print 'Test A Average: ', round(Test_A, 4)

# Average the data for Test B
Test_B = data_input['Test B'].mean()
print 'Test B Average: ', round(Test_B, 4)

# Calculate difference to between tests
Error = Test_A - Test_B
print 'Error (difference between averages): ', round(abs(Error), 4)

# Work out the r2 value between the two tests
r_value = stats.linregress(data_A, data_B)
print "r_value: ", r_value
print "R-squared: ", r_value**2

print data_input['Test A'].dtypes

3 个答案:

答案 0 :(得分:0)

r_value不是intfloat的类型是LinregressResult 要访问右值,您必须执行以下两个操作之一

v = stats.linregress(x, y)
v.rvalue

slope, intercept, r_value, p_value, std_err = stats.linregress(x, y)

答案 1 :(得分:0)

scipy.stats.linregressdocumentation返回LinregressResult。如果您看一下源代码,将给出一个示例。

from scipy import stats
np.random.seed(12345678)
x = np.random.random(10)
y = np.random.random(10)
slope, intercept, r_value, p_value, std_err = stats.linregress(x, y)
print("r-squared:", r_value**2)

答案 2 :(得分:0)

我已复制了您的代码段,现在它应该对您有用。您遇到的问题是linregress返回多个值,因此,即使不使用它们,也必须在等号左侧使用逗号分隔的列表来捕获所有值。

import pandas as pd
from scipy import stats

# Read the file in csv 
data_input = pd.read_csv("StackOF_r2.csv", low_memory=False)

#Output the number of rows
print("Total rows: {0}".format(len(data_input)))

# See which headers are available
print(list(data_input))

# Get the data from the data columns
data_A = data_input['Test A']
data_B = data_input['Test B']

# Average the data for Test A
Test_A = data_input['Test A'].mean()
print 'Test A Average: ', round(Test_A, 4)

# Average the data for Test B
Test_B = data_input['Test B'].mean()
print 'Test B Average: ', round(Test_B, 4)

# Calculate difference to between tests
Error = Test_A - Test_B
print 'Error (difference between averages): ', round(abs(Error), 4)

# Work out the r2 value between the two tests
##### This is the correction #####
slope, intercept, r_value, p_value, std_err = stats.linregress(data_A, data_B)
print "r_value: ", r_value
print "R-squared: ", r_value**2

print data_input['Test A'].dtypes

Documentation