如何使用另一个数据框的子集填充数据框的列?

时间:2019-09-04 11:46:45

标签: python pandas dataframe

我有两个这样的数据框

import pandas as pd
import numpy as np

df1 = pd.DataFrame({
    'key': list('AAABBCCAAC'),
    'prop1': list('xyzuuyxzzz'),
    'prop2': list('mnbnbbnnnn')
})

df2 = pd.DataFrame({
    'key': list('ABBCAA'),
    'prop1': [np.nan] * 6,
    'prop2': [np.nan] * 6,
    'keep_me': ['stuff'] * 6
})

  key prop1 prop2
0   A     x     m
1   A     y     n
2   A     z     b
3   B     u     n
4   B     u     b
5   C     y     b
6   C     x     n
7   A     z     n
8   A     z     n
9   C     z     n

  key  prop1  prop2 keep_me
0   A    NaN    NaN   stuff
1   B    NaN    NaN   stuff
2   B    NaN    NaN   stuff
3   C    NaN    NaN   stuff
4   A    NaN    NaN   stuff
5   A    NaN    NaN   stuff

我现在想使用prop1的值填充prop2中的df2df1列。对于每个键,我们在df1中的行将比在df2中的行多或相等(在上面的示例中:A的5倍对A的3倍,{{ 1}}和2倍的B和3倍的B和1倍的C)。对于每个键,我想使用C中每个键的前df2行来填充n

因此,我对df1的预期结果是:

df2

由于 key prop1 prop2 keep_me 0 A x m stuff 1 B u n stuff 2 B u b stuff 3 C y b stuff 4 A y n stuff 5 A z b stuff 不是唯一的,所以我不能简单地构建字典然后使用key

我希望遵循这些思路的事情会起作用:

.map

但是以

失败
  

ValueError:传递的值的形状为(5,22),索引表示(5,10)

为-我猜-索引包含非唯一值。

如何获得所需的输出?

2 个答案:

答案 0 :(得分:5)

由于key值重复,可能的解决方案是在GroupBy.cumcount的两个DataFrame中创建新的计数器列,因此可以将df2的缺失值替换为{{ 1}}由MultiIndexkey列中的DataFrame.fillna创建:

g

df1['g'] = df1.groupby('key').cumcount()
df2['g'] = df2.groupby('key').cumcount()

print (df1)
  key prop1 prop2  g
0   A     x     m  0
1   A     y     n  1
2   A     z     b  2
3   B     u     n  0
4   B     u     b  1
5   C     y     b  0
6   C     x     n  1
7   A     z     n  3
8   A     z     n  4
9   C     z     n  2

print (df2)
  key  prop1  prop2 keep_me  g
0   A    NaN    NaN   stuff  0
1   B    NaN    NaN   stuff  0
2   B    NaN    NaN   stuff  1
3   C    NaN    NaN   stuff  0
4   A    NaN    NaN   stuff  1
5   A    NaN    NaN   stuff  2

答案 1 :(得分:1)

另一种解决方案,首先从df1构建字典,然后弹出元素以填充df2中的NA

d = df1.groupby(by='key').apply(lambda x: x.values.tolist()).to_dict()
df2[['key','prop1','prop2']] = pd.DataFrame(df2.key.apply(lambda x: d[x].pop(0)).tolist())

    key prop1   prop2   keep_me
0   A   x       m       stuff
1   B   u       n       stuff
2   B   u       b       stuff
3   C   y       b       stuff
4   A   y       n       stuff
5   A   z       b       stuff