填写" na"价值与独特" na"做pandas合并时的标识符

时间:2018-03-14 17:43:40

标签: python pandas dataframe merge

我想合并两个pandas数据帧。

df1 = 
A   B
2   11
2   13
2   15
2   19
2   25
2   35
2   41
2   47
2   46
2   51
3   9
3   15
3   17
3   23
3   25
3   29
5   4
5   23
5   28

使用另一个数据帧。

   df2 = 
A   B    C
2   11   abc
2   13   cdd
2   35   cdd
2   41   cdd
2   47   cdd
3   9   cdd
3   15   cdd
3   17   cdd
3   23   cdd

两个数据帧按" A"排序。然后" B"。我希望合并columns['A', 'B'];因此对于列" C"数据丢失的地方我希望na填充它们,但na_uniqueNumber的每个缺失块都填充na

如何更新此合并方法:

data_frames = [df1, df2]
df_update = reduce(lambda left,right: pd.merge(
    left, right, on=['A', 'B'], how='outer'), data_frames).fillna('na')

注意:代码应仅在" C"中使用唯一值更新na在其他栏目存在的情况下。

预期输出:

   df2 = 
A   B    C
2   11   abc
2   13   cdd
2   15   na_01
2   19   na_01 
2   25   na_01  
2   35   cdd
2   41   cdd
2   47   cdd
2   46   na_02
2   51   na_02
3   9   cdd
3   15   cdd
3   17   cdd
3   23   cdd
3   25   na_03
3   29   na_03
5   4   na_04
5   23   na_04
5   28   na_04

谢谢,

3 个答案:

答案 0 :(得分:4)

IIUC

New = df_update[df_update.C == 'na']

s=New.reset_index().groupby('A').apply(lambda x : x['index'].diff().ne(1)).cumsum()

df_update.loc[df_update.C == 'na','C']+='_'+s.astype(str).str.pad(2,fillchar='0').values
df_update
Out[124]: 
    A   B      C
0   2  11    abc
1   2  13    cdd
2   2  15  na_01
3   2  19  na_01
4   2  25  na_01
5   2  35    cdd
6   2  41    cdd
7   2  47    cdd
8   2  46  na_02
9   2  51  na_02
10  3   9    cdd
11  3  15    cdd
12  3  17    cdd
13  3  23    cdd
14  3  25  na_03
15  3  29  na_03
16  5   4  na_04
17  5  23  na_04
18  5  28  na_04

答案 1 :(得分:3)

尝试1

def labels(d):
    mask = d.C.isnull().values
    a = d.A.values
    c = d.C.values.copy()
    i = np.flatnonzero(mask)
    f, u = pd.factorize([
        (a_, c_) for a_, c_ in zip(a[mask], (~mask).cumsum()[mask])
    ])
    c[i] = [f'na_{g+1:02d}' for g in f]
    return c


df1.merge(df2, 'left').assign(C=labels)

    A   B      C
0   2  11    abc
1   2  13    cdd
2   2  15  na_01
3   2  19  na_01
4   2  25  na_01
5   2  35    cdd
6   2  41    cdd
7   2  47    cdd
8   2  46  na_02
9   2  51  na_02
10  3   9    cdd
11  3  15    cdd
12  3  17    cdd
13  3  23    cdd
14  3  25  na_03
15  3  29  na_03
16  5   4  na_04
17  5  23  na_04
18  5  28  na_04

尝试2
也是Python 3.6

def labeler():
    tracker = {}
    return lambda k: tracker.setdefault(k, len(tracker) + 1)

def fill(d):
    c_ = labeler()
    return [
        f'na_{c_((a, g)):02d}' if pd.isna(c) else c
        for a, c, g in zip(d.A, d.C, d.C.notna().cumsum())
    ]

df1.merge(df2, 'left').assign(C=fill)

    A   B      C
0   2  11    abc
1   2  13    cdd
2   2  15  na_01
3   2  19  na_01
4   2  25  na_01
5   2  35    cdd
6   2  41    cdd
7   2  47    cdd
8   2  46  na_02
9   2  51  na_02
10  3   9    cdd
11  3  15    cdd
12  3  17    cdd
13  3  23    cdd
14  3  25  na_03
15  3  29  na_03
16  5   4  na_04
17  5  23  na_04
18  5  28  na_04

尝试3
另一种选择。不确定我更喜欢什么。

def labeler(d):
    mask = d.C.notna()
    csum = mask.cumsum()
    tups = list(zip(d.A, csum, d.C, ~mask))
    trac = dict(map(reversed, enumerate(
        pd.unique([t[:2] for t in tups if t[-1]]), 1
    )))
    return list(map(
        lambda t: f'na_{trac.get(t[:2]):02d}' if t[:2] in trac else t[2], tups
    ))

df1.merge(df2, 'left').assign(C=labeler)

    A   B      C
0   2  11    abc
1   2  13  na_01
2   2  15  na_01
3   2  19  na_01
4   2  25  na_01
5   2  35    cdd
6   2  41    cdd
7   2  47  na_02
8   2  46  na_02
9   2  51  na_02
10  3   9    cdd
11  3  15    cdd
12  3  17    cdd
13  3  23  na_03
14  3  25  na_03
15  3  29  na_03
16  5   4  na_04
17  5  23  na_04
18  5  28  na_04

答案 2 :(得分:1)

您可以merge首先通过左加入df <- data.frame(before_fear = c(1,1,1,2,3), before_pain = c(2,2,1,3,1), after_fear = c(1,3,3,2,3),after_pain = c(1,1,2,3,1)) ,然后为每个组DataFrame计数A,它们由NaN替换:< / p>

fillna