根据标志创建条件新的多个列和值

时间:2019-06-05 15:03:57

标签: python python-3.x pandas

我有一个这样的数据框。

import pandas as pd
from collections import OrderedDict

have = pd.DataFrame(OrderedDict({'User':['101','101','102','102','103','103','103'],
                     'Name':['A','A','B','B','C','C','C'],
                     'Country':['India','UK','US','UK','US','India','UK'],
                    'product':['Soaps','Brush','Soaps','Brush','Soaps','Brush','Brush'],
                    'channel':['Retail','Online','Retail','Online','Retail','Online','Online'],
                    'Country_flag':['Y','Y','N','Y','N','N','Y'],
                    'product_flag':['N','Y','Y','Y','Y','N','N'],
                    'channel_flag':['N','N','N','Y','Y','Y','Y']
                    }))

enter image description here

我想基于标志创建新列。 如果用户具有标志Y,那么我想合并这些相应的记录。

在下面的图像中,第一个记录用户仅在国家/地区上具有标记Y我想创建新的ctry列,并且该值应类似地在第二个记录国家/地区连接(用户| name | country),并且产品具有Y,然后ctry_prod列和值并置(用户|名称|国家|产品)等

想要的输出:

enter image description here

3 个答案:

答案 0 :(得分:1)

我的看法:

# columns of interest
cat_cols = ['Country', 'product', 'channel']
flag_cols = [col+'_flag' for col in cat_cols]

# select those values marked 'Y'
s = (have[cat_cols].where(have[flag_cols].eq('Y').values)
                   .stack()
                   .reset_index(level=1)
    )

# join columns and values by |
s = s.groupby(s.index).agg('|'.join)

# add the 'User' and 'Name'
s[0] = have['User'] + "|" + have['Name'] + "|" + s[0]

# unstack to turn `level_1` to columns
s = s.reset_index().set_index(['index','level_1'])[0].unstack()

# concat by rows
pd.concat((have,s), axis=1)

输出:

+----+--------+--------+-----------+-----------+-----------+----------------+----------------+----------------+-------------+-------------------+-------------------+---------------------------+--------------+-------------+--------------------+
|    |   User | Name   | Country   | product   | channel   | Country_flag   | product_flag   | channel_flag   | Country     | Country|channel   | Country|product   | Country|product|channel   | channel      | product     | product|channel    |
|----+--------+--------+-----------+-----------+-----------+----------------+----------------+----------------+-------------+-------------------+-------------------+---------------------------+--------------+-------------+--------------------|
|  0 |    101 | A      | India     | Soaps     | Retail    | Y              | N              | N              | 101|A|India | nan               | nan               | nan                       | nan          | nan         | nan                |
|  1 |    101 | A      | UK        | Brush     | Online    | Y              | Y              | N              | nan         | nan               | 101|A|UK|Brush    | nan                       | nan          | nan         | nan                |
|  2 |    102 | B      | US        | Soaps     | Retail    | N              | Y              | N              | nan         | nan               | nan               | nan                       | nan          | 102|B|Soaps | nan                |
|  3 |    102 | B      | UK        | Brush     | Online    | Y              | Y              | Y              | nan         | nan               | nan               | 102|B|UK|Brush|Online     | nan          | nan         | nan                |
|  4 |    103 | C      | US        | Soaps     | Retail    | N              | Y              | Y              | nan         | nan               | nan               | nan                       | nan          | nan         | 103|C|Soaps|Retail |
|  5 |    103 | C      | India     | Brush     | Online    | N              | N              | Y              | nan         | nan               | nan               | nan                       | 103|C|Online | nan         | nan                |
|  6 |    103 | C      | UK        | Brush     | Online    | Y              | N              | Y              | nan         | 103|C|UK|Online   | nan               | nan                       | nan          | nan         | nan                |
+----+--------+--------+-----------+-----------+-----------+----------------+----------------+----------------+-------------+-------------------+-------------------+---------------------------+--------------+-------------+--------------------+

答案 1 :(得分:0)

这是一个很难的问题

s1=have.iloc[:,-3:]
#filtr the columns
s2=have.iloc[:,2:-3]
#filtr the columns
s2=s2.where((s1=='Y').values,np.nan)
#mask the name by it condition , if Y replace it as NaN 
s3=pd.concat([have.iloc[:,:2],s2],1).stack().groupby(level=0).agg('|'.join)
#make the series you need 
s1=s1.eq('Y').dot(s1.columns+'_').str.strip('_')
#Using dot get the column name for additional columns
s=pd.crosstab(values=s3,index=have.index,columns=s1,aggfunc='first').fillna(0)
#convert it by using crosstab


df=pd.concat([have,s],axis=1)
df
Out[175]: 
  User Name Country  ...    channel_flag  product_flag product_flag_channel_flag
0  101    A   India  ...               0             0                         0
1  101    A      UK  ...               0             0                         0
2  102    B      US  ...               0   102|B|Soaps                         0
3  102    B      UK  ...               0             0                         0
4  103    C      US  ...               0             0       103|C|Soaps| Retail
5  103    C   India  ...    103|C|Online             0                         0
6  103    C      UK  ...               0             0                         0
[7 rows x 15 columns]

答案 2 :(得分:0)

不是很优雅,但是可以使用。为了清楚起见,我将循环和if语句保留在多行中:

have['Linked_Flags'] = have['Country_flag'] + have['product_flag'] + have['channel_flag']
mapping = OrderedDict([('YNN', 'ctry'), ('NYN', 'prod'), ('NNY', 'chnl'), ('YYY', 'ctry_prod_channel'),('YYN', 'ctry_prod'), ('YNY', 'ctry_channel'), ('NYY', 'prod_channel')])
string_to_add_dict = {0: 'Country', 1: 'product', 2: 'channel'}

for linked_flag in mapping.keys():
      string_to_add = ''
      for position, letter in enumerate(linked_flag):
      if letter == 'Y':
        string_to_add += have[string_to_add_dict[position]] + '| '

have[mapping[linked_flag]] = np.where(have['Linked_Flags'] == linked_flag, have['User'] + '|' + have['Name'] + '|' + string_to_add, '')

del have['Linked_Flags']