Question

我正在尝试对csv中的数据进行一些基本分析。数据具有时间戳，在该时间戳上有“测试A”和“测试B”的值。 [csv file data sample]

我获得了测试A和测试B的平均值，以及测试结果之间的差异。但是我确实需要计算r ^ 2的值，以查看两个测试之间的关系。我知道在excel中执行此操作非常简单，但是我有很多数据，因此需要对其进行最佳编码。我必须计算r ^ 2的代码部分返回错误

TypeError：**或pow（）不支持的操作数类型：“ LinregressResult”和“ int”

我想知道是否可能是因为我正在处理float64格式的列数据？ [TypeError message]

理想情况下，我也在寻找一种仅分析数据部分的方法-我想每小时分析一次数据（每小时45个数据点）。任何人都只能包含行的特定部分吗？

非常感谢！

import pandas as pd
from scipy import stats

# Read the file in csv 
data_input = pd.read_csv("StackOF_r2.csv", low_memory=False)

#Output the number of rows
print("Total rows: {0}".format(len(data_input)))

# See which headers are available
print(list(data_input))

# Get the data from the data columns
data_A = data_input['Test A']
data_B = data_input['Test B']

# Average the data for Test A
Test_A = data_input['Test A'].mean()
print 'Test A Average: ', round(Test_A, 4)

# Average the data for Test B
Test_B = data_input['Test B'].mean()
print 'Test B Average: ', round(Test_B, 4)

# Calculate difference to between tests
Error = Test_A - Test_B
print 'Error (difference between averages): ', round(abs(Error), 4)

# Work out the r2 value between the two tests
r_value = stats.linregress(data_A, data_B)
print "r_value: ", r_value
print "R-squared: ", r_value**2

print data_input['Test A'].dtypes

Answer 1

r_value不是int或float的类型是LinregressResult 要访问右值，您必须执行以下两个操作之一

v = stats.linregress(x, y)
v.rvalue

或

slope, intercept, r_value, p_value, std_err = stats.linregress(x, y)

Answer 2

scipy.stats.linregress从documentation返回LinregressResult。如果您看一下源代码，将给出一个示例。

from scipy import stats
np.random.seed(12345678)
x = np.random.random(10)
y = np.random.random(10)
slope, intercept, r_value, p_value, std_err = stats.linregress(x, y)
print("r-squared:", r_value**2)

Answer 3

我已复制了您的代码段，现在它应该对您有用。您遇到的问题是linregress返回多个值，因此，即使不使用它们，也必须在等号左侧使用逗号分隔的列表来捕获所有值。

import pandas as pd
from scipy import stats

# Read the file in csv 
data_input = pd.read_csv("StackOF_r2.csv", low_memory=False)

#Output the number of rows
print("Total rows: {0}".format(len(data_input)))

# See which headers are available
print(list(data_input))

# Get the data from the data columns
data_A = data_input['Test A']
data_B = data_input['Test B']

# Average the data for Test A
Test_A = data_input['Test A'].mean()
print 'Test A Average: ', round(Test_A, 4)

# Average the data for Test B
Test_B = data_input['Test B'].mean()
print 'Test B Average: ', round(Test_B, 4)

# Calculate difference to between tests
Error = Test_A - Test_B
print 'Error (difference between averages): ', round(abs(Error), 4)

# Work out the r2 value between the two tests
##### This is the correction #####
slope, intercept, r_value, p_value, std_err = stats.linregress(data_A, data_B)
print "r_value: ", r_value
print "R-squared: ", r_value**2

print data_input['Test A'].dtypes

Documentation

Python R ^ 2计算出了问题

3 个答案: