我正在尝试创建一个脚本,该脚本循环遍历数据帧中的行,并根据C列中的条件从A列或B列中附加值,从而创建新列。但是,附加列中的所有行,因为我的新列包含多个值。
import pandas as pd
import numpy as np
#Loading in the csv file
filename = '35180_TRA_data.csv'
df1 = pd.read_csv(filename, sep=',', nrows=1300, skiprows=25, index_col=False, header=0)
#Calculating the B concentration using column A and a factor
B_calc = df1['A']*137.818
#The measured B concentration
B_measured = df1['B']
#Looping through the dataset, and append the B_calc values where the C column is 2, while appending the B_measured values where the C column is 1.
calculations = []
for row in df1['C']:
if row == 2:
calculations.append(B_calc)
if row ==1:
calculations.append(B_measured)
df1['B_new'] = calculations
我的新列(B_new)的值都是错误的。例如,在第一行中,它应该仅为0.00,但是它包含许多值。因此在附加中出了点问题。谁能发现这个问题?
答案 0 :(得分:0)
B_calc和B_measured是数组。这样,您必须指定要分配的值,否则将分配整个数组。这是您的方法:
df1 = pd.DataFrame({"A":[1,3,5,7,9], "B" : [9,7,5,3,1], "C":[1,2,1,2,1]})
#Calculating the B concentration using column A and a factor
B_calc = df1['A']*137.818
#The measured B concentration
B_measured = df1['B']
#Looping through the dataset, and append the B_calc values where the C column is 2, while appending the B_measured values where the C column is 1.
calculations = []
for index, row in df1.iterrows():
if row['C'] == 2:
calculations.append(B_calc[index])
if row['C'] ==1:
calculations.append(B_measured[index])
df1['B_new'] = calculations
但是在行上进行迭代是一种不好的做法,因为它需要很长时间。更好的方法是使用熊猫面具,这是它的工作方式:
mask_1 = df1['C'] == 1
mask_2 = df1['C'] == 2
df1.loc[mask_1, 'C'] = df1[mask_1]['A']*137.818
df1.loc[mask_2, 'C'] = df1[mask_2]['B']