Question

我有2列要遍历的列：“ Volume_hedge”和“ Unit_hedge”。对于每一行，如果“ Unit_hedge”中的数据显示为“每天千桶”，我想将“ Volume_hedge”中的数字除（与“ Unit_hedge”在同一行中，等于“每天千桶” “）加1000。

我尝试遍历枚举的两个列以及之后的if语句。就像我说的那样，我为前两行工作，但不为其余行工作。

df2 = DataFrame(x)
columns_to_select = ['Volume_hedge', 'Unit_hedge']
for i, row in enumerate(columns_to_select):
    if df2['Unit_hedge'].loc[i] == 'Thousands of Barrels per Day':
        new_row = df2['Volume_hedge'].loc[i] / 1000
    else:
        none
    df2['Volume_hedge'].loc[i] = new_row
print(df2[columns_to_select].loc[0:8])

预期结果：

  Volume_hedge                    Unit_hedge
0         0.03  Thousands of Barrels per Day
1        0.024  Thousands of Barrels per Day
2        0.024  Thousands of Barrels per Day
3        0.024  Thousands of Barrels per Day
4        0.024  Thousands of Barrels per Day
5        0.024  Thousands of Barrels per Day
6        0.024  Thousands of Barrels per Day
7     32850000                   (MMBtu/Bbl)
8      4404000                   (MMBtu/Bbl)

实际结果：

 Volume_hedge                    Unit_hedge
0         0.03  Thousands of Barrels per Day
1        0.024  Thousands of Barrels per Day
2           24  Thousands of Barrels per Day
3           24  Thousands of Barrels per Day
4           24  Thousands of Barrels per Day
5           24  Thousands of Barrels per Day
6           24  Thousands of Barrels per Day
7     32850000                   (MMBtu/Bbl)
8      4404000                   (MMBtu/Bbl)

Answer 1

您应该在此处使用np.select：

import numpy as np

df2["Volume_hedge"] = np.select(
    [df2["Unit_hedge"].eq("Thousands of Barrels per Day")], 
    [df2["Volume_hedge"].div(1000)], 
    df2["Volume_hedge"]
)

这会将Unit_hedge等于“每天千桶”的所有行除以1000，其他所有行保持不变。

这还具有不进行迭代的优势，这在使用pandas和numpy时更快

Answer 2

要选择的列是两个元素的列表。当您枚举它时，我将在0到1之间变化。这只会将该函数应用于前两行。

如果要遍历行，则应使用iterrows函数。做类似的事情，

for i, row in df2.iterrows():
    if row['Unit_hedge'] == 'Thousands of Barrels per Day':
        new_row = row['Volume_hedge'] / 1000
    df2['Volume_hedge'].iloc[i] = new_row

但是，使用应用而不是遍历每一行是更好的选择，因为迭代非常慢。另外，在遍历数据框时设置列值也不可取

Answer 3

df['volume_hedge'][df['Unit_hedge'] == 'Thousands of Barrels per Day'] = 
df['volume_hedge'][df['Unit_hedge'] == 'Thousands of Barrels per Day']/1000

如何有条件地转换pandas数据框列

3 个答案: