使用os.path.walk时,在赋值之前引用的变量存在问题

时间:2010-11-15 16:51:46

标签: python function global-variables variable-assignment

行。我在Matlab中有一些背景知识,现在我正在转向Python。 我在64位Linux上的Pythnon 2.6.5下面有一些代码,它滚动目录,找到名为'GeneralData.dat'的文件,从中检索一些数据并将它们拼接成一个新的数据集:

import pylab as p
import os, re
import linecache as ln

def LoadGenomeMeanSize(arg, dirname, files):
        for file in files:
            filepath = os.path.join(dirname, file)
            if filepath == os.path.join(dirname,'GeneralData.dat'):
                data = p.genfromtxt(filepath)
                if data[-1,4] != 0.0: # checking if data set is OK 
                    data_chopped = data[1000:-1,:] # removing some of data
                    Grand_mean = data_chopped[:,2].mean()
                    Grand_STD = p.sqrt((sum(data_chopped[:,4]*data_chopped[:,3]**2) + sum((data_chopped[:,2]-Grand_mean)**2))/sum(data_chopped[:,4]))
                else:
                    break
            if filepath == os.path.join(dirname,'ModelParams.dat'):
                l = re.split(" ", ln.getline(filepath, 6))
                turb_param = float(l[2])                
                arg.append((Grand_mean, Grand_STD, turb_param))

GrandMeansData = []
os.path.walk(os.getcwd(), LoadGenomeMeanSize, GrandMeansData)
GrandMeansData = sorted(GrandMeansData, key=lambda data_sort: data_sort[2])

TheMeans = p.zeros((len(GrandMeansData), 3 ))
i = 0
for item in GrandMeansData:
    TheMeans[i,0] = item[0]
    TheMeans[i,1] = item[1]
    TheMeans[i,2] = item[2]
    i += 1

print TheMeans # just checking...
# later do some computation on TheMeans in NumPy

它让我觉得这个(虽然我发誓它工作了一个月的自我):

Traceback (most recent call last):
  File "/home/User/01_PyScripts/TESTtest.py", line 29, in <module>
    os.path.walk(os.getcwd(), LoadGenomeMeanSize, GrandMeansData)
  File "/usr/lib/python2.6/posixpath.py", line 233, in walk
    walk(name, func, arg)
  File "/usr/lib/python2.6/posixpath.py", line 225, in walk
    func(arg, top, names)
  File "/home/User/01_PyScripts/TESTtest.py", line 26, in LoadGenomeMeanSize
    arg.append((Grand_mean, Grand_STD, turb_param))
UnboundLocalError: local variable 'Grand_mean' referenced before assignment

好吧......所以我去做了一些阅读并提出了这个全局变量:

import pylab as p
import os, re
import linecache as ln

Grand_mean = p.nan
Grand_STD = p.nan
def LoadGenomeMeanSize(arg, dirname, files):
        for file in files:
            global Grand_mean
            global Grand_STD
            filepath = os.path.join(dirname, file)
            if filepath == os.path.join(dirname,'GeneralData.dat'):
                data = p.genfromtxt(filepath)
                if data[-1,4] != 0.0: # checking if data set is OK 
                    data_chopped = data[1000:-1,:]  # removing some of data
                    Grand_mean = data_chopped[:,2].mean()
                    Grand_STD = p.sqrt((sum(data_chopped[:,4]*data_chopped[:,3]**2) + sum((data_chopped[:,2]-Grand_mean)**2))/sum(data_chopped[:,4]))
                else:
                    break
            if filepath == os.path.join(dirname,'ModelParams.dat'):
                l = re.split(" ", ln.getline(filepath, 6))
                turb_param = float(l[2])                
                arg.append((Grand_mean, Grand_STD, turb_param))

GrandMeansData = []
os.path.walk(os.getcwd(), LoadGenomeMeanSize, GrandMeansData)
GrandMeansData = sorted(GrandMeansData, key=lambda data_sort: data_sort[2])

TheMeans = p.zeros((len(GrandMeansData), 3 ))
i = 0
for item in GrandMeansData:
    TheMeans[i,0] = item[0]
    TheMeans[i,1] = item[1]
    TheMeans[i,2] = item[2]
    i += 1

print TheMeans # just checking...
# later do some computation on TheMeans in NumPy

它不提供错误按摩。甚至给出一个包含数据的文件......但是数据是血腥的!我通过运行命令手动检查了其中一些:

import pylab as p
data = p.genfromtxt(filepath)
data_chopped = data[1000:-1,:]
Grand_mean = data_chopped[:,2].mean()
Grand_STD = p.sqrt((sum(data_chopped[:,4]*data_chopped[:,3]**2) \
+ sum((data_chopped[:,2]-Grand_mean)**2))/sum(data_chopped[:,4]))

选定的文件。他们是不同的: - (

1)任何人都可以解释我的错误吗?

2)有没有人知道解决方案?

我将不胜感激: - )

干杯,      PTR

3 个答案:

答案 0 :(得分:0)

我会说这个条件没有通过: if filepath == os.path.join(dirname,'GeneralData.dat'):

这意味着你没有在ModelParams.dat之前获得GeneralData.dat。也许您需要按字母顺序排序或文件不存在。

答案 1 :(得分:0)

我发现您提供的代码和解决方案存在一个问题。

  

永远不要通过使变量可见来隐藏“赋值前变量引用”的问题。    试着理解它为什么会发生?

在创建全局变量“Grand_mean”之前,您遇到的问题是在为其分配任何值之前访问Grand_mean。在这种情况下,通过在函数外部初始化变量并将其标记为全局变量,仅用于隐藏问题。

您会看到错误的结果,因为现在您已将变量设为可见,我将其变为全局,但问题仍然存在。你的Grand_mean从未被平衡过一些正确的数据。

这意味着“if filepath == os.path.join(dirname,...)”下的代码段从未执行过。

答案 2 :(得分:0)

使用global不是正确的解决方案。只有你确实想要引用并分配给全局“Grand_mean”名称才有意义。消除歧义的必要性来自于解释器在函数声明中为赋值运算符预先设定的方式。

您应该首先在Grand_mean范围内为LoadGenomeMeanSize()分配默认值。您有4个分支中的1个实际为Grand_mean分配一个值,该值在一个循环迭代中具有正确的语义含义。您可能遇到

的情况

if filepath == os.path.join(dirname,'ModelParams.dat'):是真的,但也是如此 if filepath == os.path.join(dirname,'GeneralData.dat'):if data[-1,4] != 0.0:不是。这可能是你失败的第二个条件。移动

快速而肮脏的答案是您可能需要重新安排代码:

...
            if filepath == os.path.join(dirname,'GeneralData.dat'):
                data = p.genfromtxt(filepath)
                if data[-1,4] != 0.0: # checking if data set is OK 
                    data_chopped = data[1000:-1,:]  # removing some of data
                    Grand_mean = data_chopped[:,2].mean()
                    Grand_STD = p.sqrt((sum(data_chopped[:,4]*data_chopped[:,3]**2) + sum((data_chopped[:,2]-Grand_mean)**2))/sum(data_chopped[:,4]))

                    if filepath == os.path.join(dirname,'ModelParams.dat'):
                        l = re.split(" ", ln.getline(filepath, 6))
                        turb_param = float(l[2])                
                        arg.append((Grand_mean, Grand_STD, turb_param))
                else:
                    break

...