Question

我想拿字典并用它来填充数据框列中的缺失值。因此，字典键对应于数据帧中的索引或数据帧中的另一列，而字典中的值对应于我想更新到数据帧中的值。这是一个更直观的示例。

    key_col  target_col
0       w      a
1       c      NaN
2       z    NaN

字典我想映射到数据框

dict = {'c':'B','z':'4'}

我希望数据框看起来像

  key_col  target_col
0       w      a
1       c      B
2       z      4

现在我已经尝试了一些不同的方法。将索引设置为key_col，然后尝试

df[target_col].map(dict)

df.loc[target_col] = df['key_col'].map(dict)

我知道替换不起作用，因为它要求我为需要替换的值设置一个条件。如果key_col / index具有数据值，我只想更新该值。

Answer 1

# m h  dom mon dow   command
SHELL=/bin/bash
PATH="$PATH:/usr/bin:/bin"
27 22 * * * /bin/bash /home/brandon/bin/scripts/backup-bedrock
#* * * * * env > ~/cronenv

Answer 2

我不确定这是最好的方法，但是考虑到您有几个样本，这样做应该不是问题：

x = x.set_index('key_col')
for k in dict.keys():
    x.loc[k] = dict[k] 
x.reset_index() # back to the original df

Answer 3

您可以将apply与lambda函数一起使用。

示例数据框。

import pandas as pd
import numpy as np

df = pd.DataFrame(
    {"key_col": {0: "w", 1: "c", 2: "z"}, "target_col": {0: "a", 1: np.nan, 2: np.nan}}
)

我重命名了字典，因为您不应该使用名称dict，因为它是Python中的内置对象。

map_dict = {"c": "B", "z": "4"}

使用apply和lambda函数。

df.loc[:, "target_col"] = df.apply(
    lambda x: map_dict.get(x["key_col"], x["target_col"]), axis=1
)

map_dict.get()允许您定义默认值，这样我们就可以使用它返回地图中未包含的行的默认target_col值。

Answer 4

另一种选择：将名称从dict更改为dict，以避免与内置类型混淆

df.set_index('key_col').T.fillna(dicts).T

           target_col
key_col 
   w         a
   c         B
   z         4

Answer 5

方法1（key_col作为附加列）：

import numpy as np
import pandas as pd

#initial dataframe
df = pd.DataFrame(data={'key_col': ['w', 'c', 'z'], 'target_col': ['a', np.NaN, np.NaN]})
#dictionary/dict values to update - key value corresponds to key_col, value to target_col
update_dict = {'c':'B','z':'4'}

for key in update_dict.keys():
#df[df['key_col'] == key]['target_col'] = update_dict[] <-- Do NOT do this
df.loc[df['key_col']==key, 'target_col'] = update_dict[key]

此方法遍历每个要更新的密钥-检查数据帧（df）中是否存在要更新的密钥（update_dict.keys（）-密钥）存在的任何位置。如果存在匹配项，则target_col中的值将设置为字典中的更新值。

方法2（key_col作为索引）

df = pd.DataFrame(data=['a', np.NaN, np.NaN], columns=['target_col'], index=['w', 'c', 'z'])
update_dict = {'c':'B','z':'4'}
for key in update_dict.keys():
df.loc[key, 'target_col'] = update_dict[key]

这种方法很容易说明。如果updated_dict包含DataFrame中不存在的密钥，请确保提供足够的错误处理， df.loc[key, 'target_col']将引发异常。

注意：DataFrame（）。loc允许我们使用列标签引用DataFrame上的特定坐标，而.iloc使用基于整数的索引标签。

Answer 6

您可以使用update（在原位进行修改），因此无需将更改分配回去。由于熊猫在索引标签和列标签上均对齐，因此我们需要重命名映射的Series，以便更新'target_col'。（将您的字典重命名为d）。

df.update(df['key_col'].map(d).rename('target_col'))

print(df)
#  key_col target_col
#0       w          a
#1       c          B
#2       z          4

您如何将字典映射到现有的pandas数据框列？

6 个答案: