计算其他列行迭代的列行

时间:2014-08-14 23:01:08

标签: python pandas calculated-columns

我尝试进行一些计算,并将它们放入新的命名列中,方法是从行中获取值,从公式计算,以及为相同行计算两个不同的列。 以下是数据和计算列的示例:

X   Y   TEMP    Data_1  Data_2  Data_3  Data_4
0   0   30  519 521 521 521
0   0   45  568 569 570 570
0   0   60  617 618 619 619
0   0   85  701 701 703 703
0   1   30  532 533 533 532
0   1   45  580 581 580 580
0   1   60  628 629 629 629
0   1   85  711 710 711 712
0   2   30  512 513 514 512
0   2   45  560 561 562 560
0   2   60  609 610 611 609
0   2   85  692 691 694 691
0   3   60  617 617 619 618
0   3   85  700 699 702 701
0   4   30  520 521 522 521
0   4   45  568 569 570 570
0   4   60  617 617 619 618
0   4   85  700 699 702 701

以下是我试图让输出看起来像:

X   Y   TEMP    Data_1  Data_2  Data_3  Data_4  Calculated_1    Calculated_2    Calculated_3    Calculated_4
0   0   30  519 521 521 521 Col A, Rows (2:5) and Data 1 Rows (2:5) Col A, Rows (2:5) and Data 2 Rows (2:5) Col A, Rows (2:5) and Data 3 Rows (2:5) Col A, Rows (2:5) and Data 4 Rows (2:5)
0   0   45  568 569 570 570             
0   0   60  617 618 619 619             
0   0   85  701 701 703 703             
0   1   30  532 533 533 532 Col A, Rows (6:9) and Data 1 Rows (6:9) Col A, Rows (6:9) and Data 2 Rows (6:9) Col A, Rows (6:9) and Data 3 Rows (6:9) Col A, Rows (6:9) and Data 4 Rows (6:9)
0   1   45  580 581 580 580             
0   1   60  628 629 629 629             
0   1   85  711 710 711 712             
0   2   30  512 513 514 512 Col A, Rows (10:13) and Data 1 Rows (10:13) Col A, Rows (10:13) and Data 2 Rows (10:13) Col A, Rows (10:13) and Data 3 Rows (10:13) Col A, Rows (10:13) and Data 4 Rows (10:13)
0   2   45  560 561 562 560             
0   2   60  609 610 611 609             
0   2   85  692 691 694 691             
0   3   60  617 617 619 618 Col A, Rows (14:15) and Data 1 Rows (14:15) Col A, Rows (14:15) and Data 2 Rows (14:15) Col A, Rows (14:15) and Data 3 Rows (14:15) Col A, Rows (14:15) and Data 4 Rows (14:15)
0   3   85  700 699 702 701             
0   4   30  520 521 522 521 Col A, Rows (16:19) and Data 1 Rows (16:19) Col A, Rows (16:19) and Data 2 Rows (16:19) Col A, Rows (16:19) and Data 3 Rows (16:19) Col A, Rows (16:19) and Data 4 Rows (16:19)
0   4   45  568 569 570 570             
0   4   60  617 617 619 618             
0   4   85  700 699 702 701             

请帮助我如何为整个数据框执行此操作,然后保存为CSV文件。

以下是我的代码:(但它会用计算出的最后一个值填充计算列)

j = 0
i = 0
k = 0
df_length = len(df.count(1)) - 1
for row in df.iterrows():
    if (int(df.loc[k, 'X']) == int(df.loc[k+1, 'X'])):
        if (int(df.loc[k, 'Y']) == int(df.loc[k+1, 'Y'])):
            j = j + 1
        else:
           for l in range(1, 5):
               df['Calculated_'+str(l)] =      ((j+1)*sum(df.loc[i:i+j,'TEMP'+str(l)]*df.loc[i:i+j,'Data_' + str(l)])-(sum(df.loc[i:i+j,'TEMP'+str(l)])*sum(df.loc[i:i+j,'Data_'+str(l)])))/((j+1)*sum(df.loc[i:i+j,'TEMP' +_ str(l)]*df.loc[i:i+j,'TEMP' + str(l)]) - (sum(df.loc[i:i+j,'TEMP'+str(l)]))**2)
            i = i + j + 1
            j = 0
    else:
        i = i + j + 1
        j = 0
    k = k + 1
    if k == df_length:
        break

我想指出,还有另外两个列X和Y,我用它来计算我需要用来计算计算列的值的数量,因为有时某些X和Temp的数据缺失收率

2 个答案:

答案 0 :(得分:0)

我得到你所说的(我猜)。所以请阅读像这样的CSV文件

import csv

def csv_reader(file_object):
    reader = csv.reader(file_object)

    row_count = 2
    temp_count = 0
    buffer_values = []

    for row in reader:

        # GETTING NEEDED DATA HERE

        data_1 = row[1]
        data_2 = row[2]

        buffer_values.append(row)

        temp_count += 1

        if (temp_count - row_count == 3): 

            # ACCESS THE BUFFER VALUES HERE.
            # THE BUFFER VALUES WILL HAVE DATA OF [2:5] ROWS FOR THE FIRST HIT HERE.
            # FOR THE NEXT HIT IT WILL BE [6:9]
            # IMPLEMENT YOUR FORMULAS HERE WITH data_1, data_2...

            row_count += 4
            temp_count = row_count

            # CLEAR THE BUFFER FOR NEXT RUN
            buffer_values = []

现在,诀窍是编写一个包含所有数据的新CSV文件。您可以在每次清除缓冲区之前执行此操作,或将所有结果存储在另一个变量中,然后将其转储到文件中。希望这会有所帮助:)

答案 1 :(得分:0)

我设法让代码正常工作。

添加以下内容以创建列名称:

for i in range(1, 5):
    data_1p8['Calculated_'+str(i)] = ''

现在我有了列名,我继续对循环代码进行一些小改动:

i = 0
j = 0
i = 0
k = 0
df_length = len(df.count(1)) - 1
for row in df.iterrows():
    if (int(df.loc[k, 'X']) == int(df.loc[k+1, 'X'])):
        if (int(df.loc[k, 'Y']) == int(df.loc[k+1, 'Y'])):
            j = j + 1
        else:
           for l in range(1, 5):
               df.loc[i,'Calculated_'+str(l)] =      ((j+1)*sum(df.loc[i:i+j,'TEMP'+str(l)]*df.loc[i:i+j,'Data_' + str(l)])-(sum(df.loc[i:i+j,'TEMP'+str(l)])*sum(df.loc[i:i+j,'Data_'+str(l)])))/((j+1)*sum(df.loc[i:i+j,'TEMP' +_ str(l)]*df.loc[i:i+j,'TEMP' + str(l)]) - (sum(df.loc[i:i+j,'TEMP'+str(l)]))**2)
            i = i + j + 1
            j = 0
    else:
        for l in range(1, 5):
               df.loc[i,'Calculated_'+str(l)] =      ((j+1)*sum(df.loc[i:i+j,'TEMP'+str(l)]*df.loc[i:i+j,'Data_' + str(l)])-(sum(df.loc[i:i+j,'TEMP'+str(l)])*sum(df.loc[i:i+j,'Data_'+str(l)])))/((j+1)*sum(df.loc[i:i+j,'TEMP' +_ str(l)]*df.loc[i:i+j,'TEMP' + str(l)]) - (sum(df.loc[i:i+j,'TEMP'+str(l)]))**2)
        i = i + j + 1
        j = 0
    k = k + 1
    if k == df_length:
        break

请注意,现在我使用变量' i'来访问行的位置。 大多数情况下,通过一些试验和错误,并阅读一些关于如何使用.loc for DataFrames。