Pandas group by one column将其他列的值连接为分隔列表

时间:2018-04-04 03:03:03

标签: python pandas group-by pandas-groupby

我想将所有资格(作为分隔符分隔列表)与作业标题分组。

在以下数据集中,相同类型的作业(.net开发人员)需要不同的资格集,而另一个作业不需要任何资格。

JobID    Job Title      Qualification ID Qualification Name
34455226 .Net Developer ICT50715         Diploma of Software Development
34455226 .Net Developer ICT40515         Certificate IV in Programming
34466933 .Net Developer ICT50715         Diploma of Software Development
34466111 .Net Developer ICT50655         Diploma of Software Testing
34479964 Snr Finance Systems Analyst 

我希望获得特定类型工作可能需要的所有独特资格的综合视图,如下所示

Job Title                     Qualifications
.Net Developer                Diploma of Software Development,Certificate IV in Programming,Diploma of Software Testing
Snr Finance Systems Analyst   N/A

这是我到目前为止所做的。

def f(x):
 return pd.Series(dict(Qualifications = ",".join(map(str, x["Qualification Name"]))))

df_jobs_qualifications\
    .groupby("Job Title")[['Qualification Name']]\
    .apply(f)

但它给了我重复的资格名称(见下文 - 软件开发文凭重复),而我想要独特的资格名称

Job Title                     Qualifications
.Net Developer                Diploma of Software Development,Certificate IV in Programming,Diploma of Software Development,Diploma of Software Testing
Snr Finance Systems Analyst   N/A

更新

我的问题与this问题有所不同,因为即使遵循前面提到的问题中提到的步骤,我也没有获得唯一值 enter image description here

1 个答案:

答案 0 :(得分:5)

如果需要唯一字符串 s:

您可以添加Class School extents React.Component { onDragStartCircle = (e) { //taking the initial state } onDragCircle = () { // draging the element } onDragEndCircle = () { // saving data to the database } render() { return ( <div> <svg> <circle cx={50} cy={50} r={10} fill="red" onMouseDown={this.onDragStartCircle} onMouseMove={this.onDragCircle} onMouseUp={this.onDragEndCircle} /> </svg> </div> ); } } unique,如果可能,添加setNone添加dropna

NaN

如果订单很重要:

df1 = (df.groupby('Job Title')['Qualification Name']
       .apply(lambda x: ','.join(set(x.dropna())))
       .reset_index())

print (df1)
                     Job Title  \
0               .Net Developer   
1  Snr Finance Systems Analyst   

                                  Qualification Name  
0  Diploma of Software Development,Diploma of Sof...  
1     

如果想要df1 = (df.groupby('Job Title')['Qualification Name'] .apply(lambda x: ','.join(x.dropna().unique())) .reset_index()) print (df1) Job Title \ 0 .Net Developer 1 Snr Finance Systems Analyst Qualification Name 0 Diploma of Software Development,Certificate IV... 1 s没有值:

NaN

如果需要唯一列表 s:

def f(x):
    val = set(x.dropna())
    if len(val) > 0:
        val = ','.join(val)
    else:
        val = np.nan
    return val

df2 = df.groupby('Job Title')['Qualification Name'].apply(f).reset_index()
print (df2)
                     Job Title  \
0               .Net Developer   
1  Snr Finance Systems Analyst   

                                  Qualification Name  
0  Diploma of Software Development,Diploma of Sof...  
1                                                NaN