我有2列要遍历的列:“ Volume_hedge”和“ Unit_hedge”。对于每一行,如果“ Unit_hedge”中的数据显示为“每天千桶”,我想将“ Volume_hedge”中的数字除(与“ Unit_hedge”在同一行中,等于“每天千桶” “)加1000。
我尝试遍历枚举的两个列以及之后的if语句。就像我说的那样,我为前两行工作,但不为其余行工作。
df2 = DataFrame(x)
columns_to_select = ['Volume_hedge', 'Unit_hedge']
for i, row in enumerate(columns_to_select):
if df2['Unit_hedge'].loc[i] == 'Thousands of Barrels per Day':
new_row = df2['Volume_hedge'].loc[i] / 1000
else:
none
df2['Volume_hedge'].loc[i] = new_row
print(df2[columns_to_select].loc[0:8])
预期结果:
Volume_hedge Unit_hedge
0 0.03 Thousands of Barrels per Day
1 0.024 Thousands of Barrels per Day
2 0.024 Thousands of Barrels per Day
3 0.024 Thousands of Barrels per Day
4 0.024 Thousands of Barrels per Day
5 0.024 Thousands of Barrels per Day
6 0.024 Thousands of Barrels per Day
7 32850000 (MMBtu/Bbl)
8 4404000 (MMBtu/Bbl)
实际结果:
Volume_hedge Unit_hedge
0 0.03 Thousands of Barrels per Day
1 0.024 Thousands of Barrels per Day
2 24 Thousands of Barrels per Day
3 24 Thousands of Barrels per Day
4 24 Thousands of Barrels per Day
5 24 Thousands of Barrels per Day
6 24 Thousands of Barrels per Day
7 32850000 (MMBtu/Bbl)
8 4404000 (MMBtu/Bbl)
答案 0 :(得分:4)
您应该在此处使用np.select
:
import numpy as np
df2["Volume_hedge"] = np.select(
[df2["Unit_hedge"].eq("Thousands of Barrels per Day")],
[df2["Volume_hedge"].div(1000)],
df2["Volume_hedge"]
)
这会将Unit_hedge
等于“每天千桶”的所有行除以1000,其他所有行保持不变。
这还具有不进行迭代的优势,这在使用pandas
和numpy
时更快
答案 1 :(得分:0)
要选择的列是两个元素的列表。当您枚举它时,我将在0到1之间变化。这只会将该函数应用于前两行。
如果要遍历行,则应使用iterrows函数。做类似的事情,
for i, row in df2.iterrows():
if row['Unit_hedge'] == 'Thousands of Barrels per Day':
new_row = row['Volume_hedge'] / 1000
df2['Volume_hedge'].iloc[i] = new_row
但是,使用应用而不是遍历每一行是更好的选择,因为迭代非常慢。另外,在遍历数据框时设置列值也不可取
答案 2 :(得分:0)
df['volume_hedge'][df['Unit_hedge'] == 'Thousands of Barrels per Day'] =
df['volume_hedge'][df['Unit_hedge'] == 'Thousands of Barrels per Day']/1000