根据条件将备用标量列连接到熊猫

时间:2020-05-11 10:44:29

标签: pandas

具有一个master数据帧和一个tag列表,如下所示:

import pandas as pd

i = ['A'] * 2 + ['B'] * 3 + ['A'] * 4 + ['B'] * 5
master = pd.DataFrame(i, columns={'cat'})
tag = [0, 1]

如何插入一列对于cat:A正常但对于cat:B却相反的标签列?预期输出为:

   cat  tags
0   A   0
1   A   1
2   B   1
3   B   0
4   B   1
5   A   0
6   A   1
7   A   0
8   A   1
9   B   1
10  B   0
...

3 个答案:

答案 0 :(得分:2)

编辑:因为有必要分别处理每个连续组,所以我尝试创建一般解决方案:

tag = ['a','b','c']

r = range(len(tag))
r1 = range(len(tag)-1, -1, -1)
print (dict(zip(r1, tag)))
{2: 'a', 1: 'b', 0: 'c'}

m1 = master['cat'].eq('A')
m2 = master['cat'].eq('B')
s = master['cat'].ne(master['cat'].shift()).cumsum()
master['tags'] = master.groupby(s).cumcount() % len(tag)

master.loc[m1, 'tags'] = master.loc[m1, 'tags'].map(dict(zip(r, tag)))
master.loc[m2, 'tags'] = master.loc[m2, 'tags'].map(dict(zip(r1, tag)))
print (master)
   cat tags
0    A    a
1    A    b
2    B    c
3    B    b
4    B    a
5    A    a
6    A    b
7    A    c
8    A    a
9    B    c
10   B    b
11   B    a
12   B    c
13   B    b

另一种方法是从标记创建DataFrame并通过左联接创建merge

tag = ['a','b','c']

s = master['cat'].ne(master['cat'].shift()).cumsum()
master['g'] = master.groupby(s).cumcount() % len(tag)

d = {'A': tag, 'B':tag[::-1]}
df = pd.DataFrame([(k,i,x) 
                   for k, v in d.items() 
                   for i, x in enumerate(v)], columns=['cat','g','tags'])
print (df)
  cat  g tags
0   A  0    a
1   A  1    b
2   A  2    c
3   B  0    c
4   B  1    b
5   B  2    a

master = master.merge(df, on=['cat','g'], how='left').drop('g', axis=1)
print (master)
   cat tags
0    A    a
1    A    b
2    B    c
3    B    b
4    B    a
5    A    a
6    A    b
7    A    c
8    A    a
9    B    c
10   B    b
11   B    a
12   B    c
13   B    b

想法是将numpy.tile用于重复tag值,该值由具有整数除法的匹配值的数量组成,然后通过索引进行过滤并由两个掩码分配:

le = len(tag)
m1 = master['cat'].eq('A')
m2 = master['cat'].eq('B')
s1 = m1.sum()
s2 = m2.sum()
master.loc[m1, 'tags'] = np.tile(tag, s1 // le + le)[:s1]
#swapped order for m2 mask
master.loc[m2, 'tags'] = np.tile(tag[::-1], s2// le + le)[:s2]
print (master)
  cat  tags
0   A   0.0
1   A   1.0
2   B   1.0
3   B   0.0
4   B   1.0
5   A   0.0
6   A   1.0
7   A   0.0
8   A   1.0

答案 1 :(得分:2)

IIUC,GroupBy.cumcount + Series.mod。 然后,我们用Series.mask反转cat是B的序列

s = df.groupby('cat').cumcount().mod(2)
df['tags'] = s.mask(df['cat'].eq('B'), ~s.astype(bool)).astype(int)
print(df)

  cat  tags
0   A     0
1   A     1
2   B     1
3   B     0
4   B     1
5   A     0
6   A     1
7   A     0
8   A     1

答案 2 :(得分:0)

numpy place在这里可能会有所帮助:

    #create temp column : 
    mapp={'A':0,'B':1}

res = (master.assign(temp=master.cat.map(mapp),
                     tag = master.cat
                    )
      )

#locate point where B changes to A
split_point = res.loc[res.temp.diff().eq(-1)].index

split_point
Int64Index([5], dtype='int64')
#split into sections :
spl = np.split(res.cat,split_point)


def replace(entry):
    np.place(entry.to_numpy(), entry=="A",[0,1])
    np.place(entry.to_numpy(),entry=="B",[1,0])
    return entry

res.tag = pd.concat(map(replace,spl))

res.drop('temp',axis=1)
    cat tag
0   A   0
1   A   1
2   B   1
3   B   0
4   B   1
5   A   0
6   A   1
7   A   0
8   A   1
9   B   1
10  B   0
11  B   1
12  B   0
13  B   1