我想合并两个pandas数据帧。
df1 =
A B
2 11
2 13
2 15
2 19
2 25
2 35
2 41
2 47
2 46
2 51
3 9
3 15
3 17
3 23
3 25
3 29
5 4
5 23
5 28
使用另一个数据帧。
df2 =
A B C
2 11 abc
2 13 cdd
2 35 cdd
2 41 cdd
2 47 cdd
3 9 cdd
3 15 cdd
3 17 cdd
3 23 cdd
两个数据帧按" A"排序。然后" B"。我希望合并columns['A', 'B']
;因此对于列" C"数据丢失的地方我希望na
填充它们,但na_uniqueNumber
的每个缺失块都填充na
。
如何更新此合并方法:
data_frames = [df1, df2]
df_update = reduce(lambda left,right: pd.merge(
left, right, on=['A', 'B'], how='outer'), data_frames).fillna('na')
注意:代码应仅在" C"中使用唯一值更新na
在其他栏目存在的情况下。
预期输出:
df2 =
A B C
2 11 abc
2 13 cdd
2 15 na_01
2 19 na_01
2 25 na_01
2 35 cdd
2 41 cdd
2 47 cdd
2 46 na_02
2 51 na_02
3 9 cdd
3 15 cdd
3 17 cdd
3 23 cdd
3 25 na_03
3 29 na_03
5 4 na_04
5 23 na_04
5 28 na_04
谢谢,
答案 0 :(得分:4)
IIUC
New = df_update[df_update.C == 'na']
s=New.reset_index().groupby('A').apply(lambda x : x['index'].diff().ne(1)).cumsum()
df_update.loc[df_update.C == 'na','C']+='_'+s.astype(str).str.pad(2,fillchar='0').values
df_update
Out[124]:
A B C
0 2 11 abc
1 2 13 cdd
2 2 15 na_01
3 2 19 na_01
4 2 25 na_01
5 2 35 cdd
6 2 41 cdd
7 2 47 cdd
8 2 46 na_02
9 2 51 na_02
10 3 9 cdd
11 3 15 cdd
12 3 17 cdd
13 3 23 cdd
14 3 25 na_03
15 3 29 na_03
16 5 4 na_04
17 5 23 na_04
18 5 28 na_04
答案 1 :(得分:3)
尝试1
def labels(d):
mask = d.C.isnull().values
a = d.A.values
c = d.C.values.copy()
i = np.flatnonzero(mask)
f, u = pd.factorize([
(a_, c_) for a_, c_ in zip(a[mask], (~mask).cumsum()[mask])
])
c[i] = [f'na_{g+1:02d}' for g in f]
return c
df1.merge(df2, 'left').assign(C=labels)
A B C
0 2 11 abc
1 2 13 cdd
2 2 15 na_01
3 2 19 na_01
4 2 25 na_01
5 2 35 cdd
6 2 41 cdd
7 2 47 cdd
8 2 46 na_02
9 2 51 na_02
10 3 9 cdd
11 3 15 cdd
12 3 17 cdd
13 3 23 cdd
14 3 25 na_03
15 3 29 na_03
16 5 4 na_04
17 5 23 na_04
18 5 28 na_04
尝试2
也是Python 3.6
def labeler():
tracker = {}
return lambda k: tracker.setdefault(k, len(tracker) + 1)
def fill(d):
c_ = labeler()
return [
f'na_{c_((a, g)):02d}' if pd.isna(c) else c
for a, c, g in zip(d.A, d.C, d.C.notna().cumsum())
]
df1.merge(df2, 'left').assign(C=fill)
A B C
0 2 11 abc
1 2 13 cdd
2 2 15 na_01
3 2 19 na_01
4 2 25 na_01
5 2 35 cdd
6 2 41 cdd
7 2 47 cdd
8 2 46 na_02
9 2 51 na_02
10 3 9 cdd
11 3 15 cdd
12 3 17 cdd
13 3 23 cdd
14 3 25 na_03
15 3 29 na_03
16 5 4 na_04
17 5 23 na_04
18 5 28 na_04
尝试3
另一种选择。不确定我更喜欢什么。
def labeler(d):
mask = d.C.notna()
csum = mask.cumsum()
tups = list(zip(d.A, csum, d.C, ~mask))
trac = dict(map(reversed, enumerate(
pd.unique([t[:2] for t in tups if t[-1]]), 1
)))
return list(map(
lambda t: f'na_{trac.get(t[:2]):02d}' if t[:2] in trac else t[2], tups
))
df1.merge(df2, 'left').assign(C=labeler)
A B C
0 2 11 abc
1 2 13 na_01
2 2 15 na_01
3 2 19 na_01
4 2 25 na_01
5 2 35 cdd
6 2 41 cdd
7 2 47 na_02
8 2 46 na_02
9 2 51 na_02
10 3 9 cdd
11 3 15 cdd
12 3 17 cdd
13 3 23 na_03
14 3 25 na_03
15 3 29 na_03
16 5 4 na_04
17 5 23 na_04
18 5 28 na_04
答案 2 :(得分:1)
您可以merge
首先通过左加入df <- data.frame(before_fear = c(1,1,1,2,3), before_pain = c(2,2,1,3,1),
after_fear = c(1,3,3,2,3),after_pain = c(1,1,2,3,1))
,然后为每个组DataFrame
计数A
,它们由NaN
替换:< / p>
fillna