Question

我有两个数据列表的两个数据集，并希望使用Python中的scipy统计数据来计算相关性和p值。两个列表中的数字相等。代码：

from scipy.stats.stats import pearsonr

a=open("a.txt")
b=open("b.txt")
print pearsonr(a,b)`

但是，它返回：TypeError: len() of unsized object

这里有什么问题？

两个txt文件是两个数字列表，类似于＆＃34; [12,13,5,7]＆＃34;

Answer 1

问题是a = open("a.txt")将文件的读者分配给a而不是内容所代表的数组。您必须自己创建阵列。我对scipy并不熟悉，但我想代码必须看起来像那样

from scipy.stats.stats import pearsonr

a=open("a.txt")
b=open("b.txt")

a_array = create_array(a)
b_array = create_array(b)
print pearsonr(a_array,b_array)

def create_array(file):
    ret = []
    for line in file:
        line.replace('[','')
        line.replace(']','')
        ret = line.split(',') 
        map(lambda x: int(x), ret)

    return ret

Answer 2

您可以使用ast制作列表对象：

from scipy.stats.stats import pearsonr,array
import ast

a=open("a.txt")
b=open("b.txt")

c=ast.literal_eval(a.read())
d=ast.literal_eval(b.read())
print type(c)
print type(d)
print pearsonr(array(c,dtype='int'),array(d,dtype='int'))
<type 'list'>
<type 'list'>
(1.0, 0.0)

a.close()
b.close()

对于多个列表：

a_r = a.readlines()
a_b = b.readlines()
a_lists = [ast.literal_eval(x) for x in a_r]
b_lists = [ast.literal_eval(x) for x in a_b]


for i,j in enumerate(a_lists):
    arr_a = array(j,dtype='int')
    arr_b = array(b_lists[i],dtype='int')
    print '{0}, {1}, {2}'.format(arr_a,arr_b,pearsonr(arr_a,arr_b))

a.close()
b.close()
[1 2 3 4], [5 6 7 8], (1.0, 0.0)
[ 4  5 10 13], [ 4  5  6 15], (0.86845192970523943, 0.13154807029476057)

使用scipy的皮尔逊相关时，Python中未大小对象的len（）

2 个答案: