这是我的主要数据帧的样子:
Group IDs New ID
1 [N23,N1,N12] N102
2 [N134,N100] N501
我还有另一个数据框,它以无序的方式包含了所有必需的ID信息:
ID Name Age
N1 Milo 5
N23 Mark 21
N11 Jacob 22
我想修改原始数据框,以便将所有ID替换为从另一个文件获得的相应名称。这样数据框就只有名称,没有ID,看起来像这样:
Group IDs New ID
1 [Mark,Silo,Bond] Niki
2 [Troy,Fangio] Kvyat
预先感谢
答案 0 :(得分:0)
您可以尝试从第二个DF编写字典,然后使用正则表达式模式替换第一个(不需要完全理解,请在下面查看注释):
ps:由于您没有提供完整的df代码,因此我用其中的一些代码创建了,这就是为什么print()不会替换所有结果的原因。
import pandas as pd
# creating dummy dfs
df1 = pd.DataFrame({"Group":[1,2], "IDs":["[N23,N1,N12]", "[N134,N100]"], "New ID":["N102", "N501"] })
df2 = pd.DataFrame({"ID":['N1', "N23", "N11", "N100"], "Name":["Milo", "Mark", "Jacob", "Silo"], "Age":[5,21,22, 44]})
# Create the unique dict we're using regex patterns to make exact match
dict_replace = df2.set_index("ID")['Name'].to_dict()
# 'f' before string means fstrings and 'r' means to interpret it as regex
# the \b is a regex pattern that it sinalizes the begining and end of the match
## so that if you're searching for N1, it won't match if it is N11
dict_replace = {fr"\b{k}\b":v for k, v in dict_replace.items()}
# Replacing on original where you want it
df1['IDs'].replace(dict_replace, regex=True, inplace=True)
print(df1['IDs'].tolist())
# >>> ['[Mark,Milo,N12]', '[N134,Silo]']
答案 1 :(得分:0)
请注意我的数据框中的更改。在示例数据中,df1中不存在的df中的ID。我更改了df,以确保仅代表df1中的ID。我使用以下df
print(df)
Group IDs New
0 1 [N23,N1,N11] N102
1 2 [N11,N23] N501
print(df1)
ID Name Age
0 N1 Milo 5
1 N23 Mark 21
2 N11 Jacob 22
解决方案
dict df1.Id和df.Name并映射到爆炸的df.ID。将结果添加到列表中。
df['IDs'] = df['IDs'].str.strip('[]')#Strip corner brackets
df['IDs'] = df['IDs'].str.split(',')#Reconstruct list, this was done because for some reason I couldnt explode list
#df.explode list and map df1 to df and add to list
df.explode('IDs').groupby('Group')['IDs'].apply(lambda x:(x.map(dict(zip(df1.ID,df1.Name)))).tolist()).reset_index()
Group IDs
0 1 [Mark, Milo, Jacob]
1 2 [Jacob, Mark]
答案 2 :(得分:0)
IIUC,您可以.explode
列表,用.map
替换值,并用.groupby
重新分组
df['ID'] = (df.ID.explode()
.map(df1.set_index('ID')['Name'])
.groupby(level=0).agg(list)
)
如果新ID 列不是列表,则只能使用.map()
df['New ID'] = df['New ID'].map(df1.set_index('ID')['Name'])