Question

首先，对不起我的英语。我不是说英语的人。通常，我试图为每个输入文件检索可变样本大小的索引。之后，我想比较两个输入文件之间的值。因此，我有两个输入文件作为参数。第一个功能在一定范围内随机选择值。第二个函数获取第一个函数的输出，并为每个样本计算一个索引。

为此，我首先将两个输入文件作为参数传递给脚本。

file_name1 = sys.argv[1]
file_name2 = sys.argv[2]

我选择了每个输入文件的值，并将它们保存到列表中，如下所示：

data1 = [2, 6, 4, 8, 9, 8, 6, 6, 6,, 7, 7, 4, 2, 2, 2, ......] #sample size 835
data2 = [7, 7, 5, 3,4, 2, 8, 6, 5, 1, 1, 9, 7 ......] #sample size 2010

我编写了一个函数，该函数从列表中的某个范围（0、200、400，n）中随机选择数字。之后，我将相同的值分组并保存了值，并将值的数量作为键值保存在字典中。

def subsamples(list_object):

   val = np.array(list_object)
   n = len(val)
   count = 0
   while (count < n ):
       count += 200
     if (count > n):
        break
     subsample = np.random.choice(val, count, replace=False)
     unique, counts = np.unique(subsample, return_counts=True)
     group_cat = dict(zip(unique, counts))
     pois_group.append(group_cat)

     return pois_group

此外，我还有第二个函数，可以为每个样本量计算一个索引。

def list_sample_size(object):
   data = subsamples(object)
   def p(n, N):
        if n is 0:
            #return 0
        else:
            return (float(n)/N) * ln(float(n)/N)
    for i in data:
        N = sum(i.values())
        #calculate the Index
        sh = -sum(p(n,N) for n in i.values() if n is not 0)
        index = round(math.exp(sh),2)
        print("Index: %f, sample size: %s" % (index, N))
        y.append(index)
        x.append(N)
    return x,y

x_1, y_1= list_sample_size(data1)
print "--------------------"
x_2, y_2 = list_sample_size(data2)

但是当我为每个输入文件调用函数时，我得到了关注。它正确地输出了第一个输入，但是第二个输出打印了第一个输入1，然后输出了自己的输入，现在有人在做什么错吗？

Input1 has 835 pois
Index: 37.720000 , sample size: 200
Index: 43.590000 , sample size: 400
Index: 46.010000 , sample size: 600
Index: 48.770000 , sample size: 800
---------------------------
Input1 has 2010 pois
Index: 37.720000 , sample size: 200
Index: 43.590000 , sample size: 400
Index: 46.010000 , sample size: 600
Index: 48.770000 , sample size: 800
Index: 22.610000 , sample size: 200
Index: 21.110000 , sample size: 400
Index: 25.920000 , sample size: 600
Index: 27.670000 , sample size: 800
Index: 28.630000 , sample size: 1000
Index: 28.110000 , sample size: 1200
Index: 28.380000 , sample size: 1400
Index: 28.610000 , sample size: 1600
Index: 28.910000 , sample size: 1800
Index: 29.120000 , sample size: 2000

有人知道我做错了吗？

Answer 1

您不会在此处显示整个文件，但是我很确定您已将model.predict()定义为文件级别的列表。每次您调用pois_group时，它都会扩展该列表。您应该在subsamples函数的设置中添加pois_group = []。

将两个输入文件传递给函数会导致不良结果

1 个答案: