我有一个数据框,例如
Groups NAME VALUES
G1 A 1
G1 B 2
G1 C 3
G1 C 3
G2 D NaN
G2 E NaN
G2 D NaN
G3 F NaN
G3 G NaN
G3 H NaN
G4 I 8
G4 I 8
G4 J 89
G4 K 65
我只想用 Groups
值填充 NaN
并为每个不同的 NAME
添加一个数字,从 1
那我应该得到:
Groups NAME VALUES
G1 A 1
G1 B 2
G1 C 3
G1 C 3
G2 D 1
G2 E 2
G2 D 1
G3 F 1
G3 G 2
G3 H 3
G4 I 8
G4 I 8
G4 J 89
G4 K 65
数据如下:
{'Groups': {0: 'G1', 1: 'G1', 2: 'G1', 3: 'G1', 4: 'G2', 5: 'G2', 6: 'G2', 7: 'G3', 8: 'G3', 9: 'G3', 10: 'G4', 11: 'G4', 12: 'G4', 13: 'G4'}, 'NAME': {0: 'A', 1: 'B', 2: 'C', 3: 'C', 4: 'D', 5: 'E', 6: 'D', 7: 'F', 8: 'G', 9: 'H', 10: 'I', 11: 'I', 12: 'J', 13: 'K'}, 'VALUES': {0: 1.0, 1: 2.0, 2: 3.0, 3: 3.0, 4: nan, 5: nan, 6: nan, 7: nan, 8: nan, 9: nan, 10: 8.0, 11: 8.0, 12: 89.0, 13: 65.0}}
答案 0 :(得分:2)
我将首先为 NaN 行选择唯一的名称:
m = df['VALUES'].isna()
names = df.loc[m, 'NAME'].unique()
然后为这些创建一个映射:
mapping = dict(zip(names, list(range(1,len(names)+1))))
然后用映射填充 NaN 行的 VALUES:
df.loc[m, 'VALUES'] = df.loc[m, 'NAMES'].map(mapping)
更新以根据我从您的评论中了解到的 GROUPS 填充 VALUES:
所以我们再次选择带有 NaN VALUES 的行。现在我们做一个 groupby 并使用转换保留原始 df 索引。要添加列表,我们需要知道组的长度。因此,我添加了尺寸列。
df = pd.DataFrame({'Groups': {0: 'G1', 1: 'G1', 2: 'G1', 3: 'G1', 4: 'G2', 5: 'G2', 6: 'G2', 7: 'G3', 8: 'G3', 9: 'G3', 10: 'G4', 11: 'G4', 12: 'G4', 13: 'G4'}, 'NAME': {0: 'A', 1: 'B', 2: 'C', 3: 'C', 4: 'D', 5: 'E', 6: 'D', 7: 'F', 8: 'G', 9: 'H', 10: 'I', 11: 'I', 12: 'J', 13: 'K'}, 'VALUES': {0: 1.0, 1: 2.0, 2: 3.0, 3: 3.0, 4: np.nan, 5: np.nan, 6: np.nan, 7: np.nan, 8: np.nan, 9: np.nan, 10: 8.0, 11: 8.0, 12: 89.0, 13: 65.0}})
sizes = df.groupby(['Groups']).size()
df['Size']=df['Groups'].map(sizes)
m = df['VALUES'].isna()
接下来你想给 Group 和 NAME 的重复出现(所以 Group 和 NAME 上的 groupby)相同的数字(如 G2 和 D)=> 因此我们选择这些行的第一次出现并将其映射到组合组和名称:
df.loc[m, 'VALUES_new'] = df.loc[m].groupby(['Groups'])['Size'].transform(lambda x:list(range(1,len(x)+1)))
mapping = df.loc[m].groupby(['Groups', 'NAME'])['VALUES_new'].first().copy()
df.set_index(['Groups', 'NAME'], inplace=True)
m = df['VALUES'].isna()
df.loc[m,'VALUES'] = df.loc[m].index.map(mapping)
df.reset_index(inplace=True)
df.drop(columns=['Size', 'VALUES_new'], inplace=True)
df['VALUES']=df['VALUES'].astype(int)
只是为了看看各个组会发生什么,你可以运行这个:
df = pd.DataFrame({'Groups': {0: 'G1', 1: 'G1', 2: 'G1', 3: 'G1', 4: 'G2', 5: 'G2', 6: 'G2', 7: 'G3', 8: 'G3', 9: 'G3', 10: 'G4', 11: 'G4', 12: 'G4', 13: 'G4'}, 'NAME': {0: 'A', 1: 'B', 2: 'C', 3: 'C', 4: 'D', 5: 'E', 6: 'D', 7: 'F', 8: 'G', 9: 'H', 10: 'I', 11: 'I', 12: 'J', 13: 'K'}, 'VALUES': {0: 1.0, 1: 2.0, 2: 3.0, 3: 3.0, 4: np.nan, 5: np.nan, 6: np.nan, 7: np.nan, 8: np.nan, 9: np.nan, 10: 8.0, 11: 8.0, 12: 89.0, 13: 65.0}})
m = df['VALUES'].isna()
grouped = df.loc[m].groupby(['Groups']) #groupby object
for group in grouped:
print(group[0]) # str with the group name
dfgroup = group[1] # dataframe of the group
values = list(range(1,len(dfgroup)+1))
dfgroup['VALUES'] = values
print(dfgroup)
答案 1 :(得分:1)
尝试将每个组的名称转换为类别类型,然后获取猫代码并加 1:
import numpy as np
import pandas as pd
d = {'Groups': {0: 'G1', 1: 'G1', 2: 'G1', 3: 'G1', 4: 'G2', 5: 'G2', 6: 'G2',
7: 'G3', 8: 'G3', 9: 'G3', 10: 'G4', 11: 'G4', 12: 'G4',
13: 'G4'},
'NAME': {0: 'A', 1: 'B', 2: 'C', 3: 'C', 4: 'D', 5: 'E', 6: 'D', 7: 'F',
8: 'G', 9: 'H', 10: 'I', 11: 'I', 12: 'J', 13: 'K'},
'VALUES': {0: 1.0, 1: 2.0, 2: 3.0, 3: 3.0, 4: np.nan, 5: np.nan,
6: np.nan, 7: np.nan, 8: np.nan, 9: np.nan, 10: 8.0,
11: 8.0, 12: 89.0, 13: 65.0}}
df = pd.DataFrame(d)
# Mask for Where VALUES is NaN
m = df['VALUES'].isna()
# Groupby 'Groups'
df.loc[m, 'VALUES'] = df[m].groupby('Groups', as_index=False, sort=False).apply(
# Convert 'NAME' to a category and grab the cat codes
# add 1 to start with 1 instead of 0
lambda g: g['NAME'].astype('category').cat.codes + 1
).values
# Convert to int to match output
df['VALUES'] = df['VALUES'].astype(int)
print(df)
df
:
Groups NAME VALUES
0 G1 A 1
1 G1 B 2
2 G1 C 3
3 G1 C 3
4 G2 D 1
5 G2 E 2
6 G2 D 1
7 G3 F 1
8 G3 G 2
9 G3 H 3
10 G4 I 8
11 G4 I 8
12 G4 J 89
13 G4 K 65