我尝试进行一些计算,并将它们放入新的命名列中,方法是从行中获取值,从公式计算,以及为相同行计算两个不同的列。 以下是数据和计算列的示例:
X Y TEMP Data_1 Data_2 Data_3 Data_4
0 0 30 519 521 521 521
0 0 45 568 569 570 570
0 0 60 617 618 619 619
0 0 85 701 701 703 703
0 1 30 532 533 533 532
0 1 45 580 581 580 580
0 1 60 628 629 629 629
0 1 85 711 710 711 712
0 2 30 512 513 514 512
0 2 45 560 561 562 560
0 2 60 609 610 611 609
0 2 85 692 691 694 691
0 3 60 617 617 619 618
0 3 85 700 699 702 701
0 4 30 520 521 522 521
0 4 45 568 569 570 570
0 4 60 617 617 619 618
0 4 85 700 699 702 701
以下是我试图让输出看起来像:
X Y TEMP Data_1 Data_2 Data_3 Data_4 Calculated_1 Calculated_2 Calculated_3 Calculated_4
0 0 30 519 521 521 521 Col A, Rows (2:5) and Data 1 Rows (2:5) Col A, Rows (2:5) and Data 2 Rows (2:5) Col A, Rows (2:5) and Data 3 Rows (2:5) Col A, Rows (2:5) and Data 4 Rows (2:5)
0 0 45 568 569 570 570
0 0 60 617 618 619 619
0 0 85 701 701 703 703
0 1 30 532 533 533 532 Col A, Rows (6:9) and Data 1 Rows (6:9) Col A, Rows (6:9) and Data 2 Rows (6:9) Col A, Rows (6:9) and Data 3 Rows (6:9) Col A, Rows (6:9) and Data 4 Rows (6:9)
0 1 45 580 581 580 580
0 1 60 628 629 629 629
0 1 85 711 710 711 712
0 2 30 512 513 514 512 Col A, Rows (10:13) and Data 1 Rows (10:13) Col A, Rows (10:13) and Data 2 Rows (10:13) Col A, Rows (10:13) and Data 3 Rows (10:13) Col A, Rows (10:13) and Data 4 Rows (10:13)
0 2 45 560 561 562 560
0 2 60 609 610 611 609
0 2 85 692 691 694 691
0 3 60 617 617 619 618 Col A, Rows (14:15) and Data 1 Rows (14:15) Col A, Rows (14:15) and Data 2 Rows (14:15) Col A, Rows (14:15) and Data 3 Rows (14:15) Col A, Rows (14:15) and Data 4 Rows (14:15)
0 3 85 700 699 702 701
0 4 30 520 521 522 521 Col A, Rows (16:19) and Data 1 Rows (16:19) Col A, Rows (16:19) and Data 2 Rows (16:19) Col A, Rows (16:19) and Data 3 Rows (16:19) Col A, Rows (16:19) and Data 4 Rows (16:19)
0 4 45 568 569 570 570
0 4 60 617 617 619 618
0 4 85 700 699 702 701
请帮助我如何为整个数据框执行此操作,然后保存为CSV文件。
以下是我的代码:(但它会用计算出的最后一个值填充计算列)
j = 0
i = 0
k = 0
df_length = len(df.count(1)) - 1
for row in df.iterrows():
if (int(df.loc[k, 'X']) == int(df.loc[k+1, 'X'])):
if (int(df.loc[k, 'Y']) == int(df.loc[k+1, 'Y'])):
j = j + 1
else:
for l in range(1, 5):
df['Calculated_'+str(l)] = ((j+1)*sum(df.loc[i:i+j,'TEMP'+str(l)]*df.loc[i:i+j,'Data_' + str(l)])-(sum(df.loc[i:i+j,'TEMP'+str(l)])*sum(df.loc[i:i+j,'Data_'+str(l)])))/((j+1)*sum(df.loc[i:i+j,'TEMP' +_ str(l)]*df.loc[i:i+j,'TEMP' + str(l)]) - (sum(df.loc[i:i+j,'TEMP'+str(l)]))**2)
i = i + j + 1
j = 0
else:
i = i + j + 1
j = 0
k = k + 1
if k == df_length:
break
我想指出,还有另外两个列X和Y,我用它来计算我需要用来计算计算列的值的数量,因为有时某些X和Temp的数据缺失收率
答案 0 :(得分:0)
我得到你所说的(我猜)。所以请阅读像这样的CSV文件
import csv
def csv_reader(file_object):
reader = csv.reader(file_object)
row_count = 2
temp_count = 0
buffer_values = []
for row in reader:
# GETTING NEEDED DATA HERE
data_1 = row[1]
data_2 = row[2]
buffer_values.append(row)
temp_count += 1
if (temp_count - row_count == 3):
# ACCESS THE BUFFER VALUES HERE.
# THE BUFFER VALUES WILL HAVE DATA OF [2:5] ROWS FOR THE FIRST HIT HERE.
# FOR THE NEXT HIT IT WILL BE [6:9]
# IMPLEMENT YOUR FORMULAS HERE WITH data_1, data_2...
row_count += 4
temp_count = row_count
# CLEAR THE BUFFER FOR NEXT RUN
buffer_values = []
现在,诀窍是编写一个包含所有数据的新CSV文件。您可以在每次清除缓冲区之前执行此操作,或将所有结果存储在另一个变量中,然后将其转储到文件中。希望这会有所帮助:)
答案 1 :(得分:0)
我设法让代码正常工作。
添加以下内容以创建列名称:
for i in range(1, 5):
data_1p8['Calculated_'+str(i)] = ''
现在我有了列名,我继续对循环代码进行一些小改动:
i = 0
j = 0
i = 0
k = 0
df_length = len(df.count(1)) - 1
for row in df.iterrows():
if (int(df.loc[k, 'X']) == int(df.loc[k+1, 'X'])):
if (int(df.loc[k, 'Y']) == int(df.loc[k+1, 'Y'])):
j = j + 1
else:
for l in range(1, 5):
df.loc[i,'Calculated_'+str(l)] = ((j+1)*sum(df.loc[i:i+j,'TEMP'+str(l)]*df.loc[i:i+j,'Data_' + str(l)])-(sum(df.loc[i:i+j,'TEMP'+str(l)])*sum(df.loc[i:i+j,'Data_'+str(l)])))/((j+1)*sum(df.loc[i:i+j,'TEMP' +_ str(l)]*df.loc[i:i+j,'TEMP' + str(l)]) - (sum(df.loc[i:i+j,'TEMP'+str(l)]))**2)
i = i + j + 1
j = 0
else:
for l in range(1, 5):
df.loc[i,'Calculated_'+str(l)] = ((j+1)*sum(df.loc[i:i+j,'TEMP'+str(l)]*df.loc[i:i+j,'Data_' + str(l)])-(sum(df.loc[i:i+j,'TEMP'+str(l)])*sum(df.loc[i:i+j,'Data_'+str(l)])))/((j+1)*sum(df.loc[i:i+j,'TEMP' +_ str(l)]*df.loc[i:i+j,'TEMP' + str(l)]) - (sum(df.loc[i:i+j,'TEMP'+str(l)]))**2)
i = i + j + 1
j = 0
k = k + 1
if k == df_length:
break
请注意,现在我使用变量' i'来访问行的位置。 大多数情况下,通过一些试验和错误,并阅读一些关于如何使用.loc for DataFrames。