Question

我正在尝试从文本文件中获取数据并计算该文件的每600行的平均值。我正在从文件加载文本，将其放入一个numpy数组并枚举它。我可以得到前600行的平均值，但我不知道如何编写一个循环，以便python计算每600行的平均值，然后将其放入一个新的文本文件中。到目前为止，这是我的代码：

import numpy as np

#loads file and places it in array
data = np.loadtxt('244UTZ10htz.txt', delimiter = '\t', skiprows = 2)
shape = np.shape(data)

#creates array for u wind values
for i,d in enumerate(data):
    data[i] = (d[3])
    if i == 600:
        minavg = np.mean(data[i == 600])

#finds total u mean for day
ubar = np.mean(data)

Answer 1

根据我对你的问题的理解，听起来你有一些文件想要取每行到第600行的平均值，并重复多次，直到没有更多的数据。因此，在第600行，您的平均线为0 - 600，在第1200行，您的平均线为600到1200。

模数除法是获得平均值的一种方法，当你达到每600行时，不必使用单独的变量来计算你已经循环的线数。此外，我使用Numpy Array Slicing创建原始数据的视图，仅包含数据集中的第4列。

这个例子应该做你想要的，但它完全没有测试......我也不是非常熟悉numpy，所以有更好的方法做到这一点，如其他答案所述：

import numpy as np

#loads file and places it in array
data = np.loadtxt('244UTZ10htz.txt', delimiter = '\t', skiprows = 2)
shape = np.shape(data)
data_you_want = data[:,3]
daily_averages = list()


#creates array for u wind values
for i,d in enumerate(data_you_want):
    if (i % 600) == 0:
        avg_for_day = np.mean(data_you_want[i - 600:i])
        daily_averages.append(avg_for_day)

您可以修改上面的示例，将均值输出到新文件，而不是像我一样附加到列表，或者只将daily_averages列表写入您想要的任何文件。

作为奖励，这是一个仅使用CSV库的Python解决方案。它没有经过多少测试，但理论上应该可以工作，并且对于刚接触Python的人来说可能相当容易理解。

import csv 

data = list()
daily_average = list()
num_lines = 600

with open('testme.csv', 'r') as csvfile:
    reader = csv.reader(csvfile, delimiter="\t")

    for i,row in enumerate(reader):
        if (i % num_lines) == 0 and i != 0:
            average = sum(data[i - num_lines:i]) / num_lines
            daily_average.append(average)

        data.append(int(row[3]))

希望这有帮助！

Answer 2

简单的解决方案是：

import numpy as np
data = np.loadtxt('244UTZ10htz.txt', delimiter = '\t', skiprows = 2)
mydata=[]; counter=0
for i,d in enumerate(data):
   mydata.append((d[3]))

    # Find the average of the previous 600 lines
   if counter == 600:
      minavg = np.mean(np.asarray(mydata))

      # reset the counter and start counting from 0
      counter=0; mydata=[]
   counter+=1

Answer 3

以下程序使用array slicing获取列，然后使用列表推导索引到列中以获取方法。对后者使用for循环可能更简单。

切片/索引到数组而不是创建新对象也具有速度的优势，因为你只是creating new views into existing data。

import numpy as np

# test data
nr = 11
nc = 3
a = np.array([np.array(range(nc))+i*10 for i in range(nr)])
print a

# slice to get column
col = a[:,1]
print col

# comprehension to step through column to get means
numpermean = 2
means = [np.mean(col[i:(min(len(col), i+numpermean))]) \
         for i in range(0,len(col),numpermean)]

print means

打印

[[  0   1   2]
 [ 10  11  12]
 [ 20  21  22]
 [ 30  31  32]
 [ 40  41  42]
 [ 50  51  52]
 [ 60  61  62]
 [ 70  71  72]
 [ 80  81  82]
 [ 90  91  92]
 [100 101 102]]
[  1  11  21  31  41  51  61  71  81  91 101]
[6.0, 26.0, 46.0, 66.0, 86.0, 101.0]

Answer 4

这样的事情有效。也许不那么可读。但应该相当快。

n = int(data.shape[0]/600)
interestingData = data[:,3]
daily_averages =  np.mean(interestingData[:600*n].reshape(-1, 600), axis=1)

计算每X行数的平均值

4 个答案: