Question

我有这样的数据框。

import pandas as pd

raw_data = {'Sub1':['A','B','C','D','E'],
            'Sub2':['F','G','H','I','J'],
            'Sub3':['K','L','M','N','O'],
    'S_score1': [1, 0, 0, 6,0], 
    'S_score2': [0, 1, 0, 6,0], 
    'S_score3': [0, 1, 0, 6,0], 
    }

df2 = pd.DataFrame(raw_data, columns = ['Sub1','Sub2','Sub3','S_score1', 'S_score2', 'S_score3'])

具有数据框

我想检查分数列并检查分数是否大于1，然后在文本中输入相应的主题。

想要的输出：

Answer 1

多次处理join

s=(df2.filter(like='Sub')*df2.filter(like='S_').ge(1).values).apply( lambda x : ','.join([y for y in x if y is not '']),axis=1)
s
Out[324]: 
0        A
1      G,L
2         
3    D,I,N
4         
dtype: object

然后使用np.where

链接

np.where(s=='','You do not have score','You have'+s)
Out[326]: 
array(['You haveA', 'You haveG,L', 'You do not have score',
       'You haveD,I,N', 'You do not have score'], dtype=object)

#Assign it back 

df2['s_txt']=np.where(s=='','You do not have score','You have'+s)
df2
Out[328]: 
  Sub1 Sub2          ...           S_score3                  s_txt
0    A    F          ...                  0              You haveA
1    B    G          ...                  1            You haveG,L
2    C    H          ...                  0  You do not have score
3    D    I          ...                  6          You haveD,I,N
4    E    J          ...                  0  You do not have score
[5 rows x 7 columns]

Answer 2

首先，将成绩列与一个热门列分开。

u = df2.filter(like='Sub')
v = df2.filter(like='S_score').astype(bool)

接下来，通过乘法汇总字母等级，并设置列值。

r = (u.mul(v.values)
      .agg(','.join, axis=1)
      .str.strip(',')
      .str.replace(',{2,}', ','))
df2['s_text'] = np.where(r.str.len() > 0, 'You scored ' + r, 'N/A')    
df2

  Sub1 Sub2 Sub3  S_score1  S_score2  S_score3            s_text
0    A    F    K         1         0         0      You scored A
1    B    G    L         0         1         1    You scored G,L
2    C    H    M         0         0         0               N/A
3    D    I    N         6         6         6  You scored D,I,N
4    E    J    O         0         0         0               N/A

要使最后一个分隔符不同，您将需要一个自定义函数。

def join(lst):
    lst = lst[lst != '']
    if len(lst) > 1:
        return 'You scored ' + ', '.join(lst[:-1]) + ' and ' + lst[-1] 
    elif len(lst) > 0:
        return 'You scored ' + ', '.join(lst)
    return 'N/A'

df2['s_text'] = u.mul(v.values).agg(join, axis=1)
df2

  Sub1 Sub2 Sub3  S_score1  S_score2  S_score3                 s_text
0    A    F    K         1         0         0           You scored A
1    B    G    L         0         1         1     You scored G and L
2    C    H    M         0         0         0                    N/A
3    D    I    N         6         6         6  You scored D, I and N
4    E    J    O         0         0         0                    N/A

Answer 3

可能的解决方案之一包括以下步骤：

定义一个函数，为源行生成输出文本。此函数应加入为非null过滤的源列。
生成包含subs，Sub1和Sub2的{{1}}表。
生成Sub3（掩码）表，其中包含msk列和将列名更改为S_score...，Sub1和Sub2。
计算Sub3并将以上函数应用于每一行。请注意，对于mask中的 False 元素，相应的输出元素是 None ，因此应用的函数不会在连接中包含它。

因此整个脚本如下所示：

subs.where(msk)

根据列值为每一行生成摘要

3 个答案: