基于另一列在熊猫数据框中添加新列

时间:2021-04-02 20:59:09

标签: python pandas dataframe

我有一个数据框,其中有一列 bmi 基于该列我想创建另一列,该列将显示相对于该行 bmi 值的 bmi 范围。下面是我的代码:

for i in range(df["bmi"].count()):
if df["bmi"][i] < 18.5:
    df["bmi_category"] = "Under Weight"
elif 25 > df["bmi"][i] >= 18.5:
    df["bmi_category"] = "Healthy Weight"
elif 30 > df["bmi"][i] >= 25:
    df["bmi_category"] = "Overweight"
elif df["bmi"][i] >= 30:
    df["bmi_category"] = "Obese"

但是当我运行这段代码时,我收到了这个错误。

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
c:\users\hridoy\appdata\local\programs\python\python39\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   3079             try:
-> 3080                 return self._engine.get_loc(casted_key)
   3081             except KeyError as err:

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()

KeyError: 228

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
<ipython-input-220-e7569ff34eec> in <module>
      1 for i in range(cardio["bmi"].count()):
----> 2     if cardio["bmi"][i] < 18.5:
      3         cardio["bmi_category"] = "Under Weight"
      4     elif 25 > cardio["bmi"][i] >= 18.5:
      5         cardio["bmi_category"] = "Healthy Weight"

c:\users\hridoy\appdata\local\programs\python\python39\lib\site-packages\pandas\core\series.py in __getitem__(self, key)
    849 
    850         elif key_is_scalar:
--> 851             return self._get_value(key)
    852 
    853         if is_hashable(key):

c:\users\hridoy\appdata\local\programs\python\python39\lib\site-packages\pandas\core\series.py in _get_value(self, label, takeable)
    957 
    958         # Similar to Index.get_value, but we do not fall back to positional
--> 959         loc = self.index.get_loc(label)
    960         return self.index._get_values_for_loc(self, loc, label)
    961 

c:\users\hridoy\appdata\local\programs\python\python39\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   3080                 return self._engine.get_loc(casted_key)
   3081             except KeyError as err:
-> 3082                 raise KeyError(key) from err
   3083 
   3084         if tolerance is not None:

KeyError: 228

谁能告诉我我在这里做错了什么?以及如何解决这个问题?

2 个答案:

答案 0 :(得分:2)

以下将 bmi 列中的值映射到 bmi_category 列中的值

def get_category(bmi):
    if not bmi:
        return None
    if bmi < 18.5:
        return "Under Weight"
    if bmi < 25:
        return "Healthy Weight"
    if bmi < 30:
        return "Overweight"
    return "Obese"

df['bmi_category'] = df['bmi'].apply(get_category)

附言如果您发现自己在一个数据帧上进行迭代,那么几乎总有一个函数可以更快、更干净地完成它。

答案 1 :(得分:1)

您可以使用 pd.cut 有效地​​执行此操作。

df = pd.DataFrame(np.random.randint(16,35,(50,1)), columns=["bmi"])
df['bmi_category'] = pd.cut(df['bmi'], [0, 18.5, 25, 30, np.infty], labels=["Under Weight", "Healthy Weight", "Overweight", "Obese"], right=False)