我有以下数据框
import pandas as pd
compnaies = ['Microsoft', 'Google', 'Amazon', 'Microsoft', 'Facebook', 'Google','Google']
products = ['OS', 'Search', 'E-comm', 'X-box', 'Social Media', 'Android','Search']
df = pd.DataFrame({'company' : compnaies, 'product':products })
我执行以下操作:
df.groupby('company').product.agg([('count', 'count'), ('product', ', '.join)])
count product
company
Amazon 1 E-comm
Facebook 1 Social Media
Google 3 Search, Android, Search
Microsoft 2 OS, X-box
如何在上述代码之后而不是数量和产品上命名列:
预期输出:
company Number Product List.
Amazon 1 E-comm
Facebook 1 Social Media
Google 3 Search, Android, Search
Microsoft 2 OS, X-box
预期的输出2:
预期输出:
company Number Product List. uniquecount uniquevalues
Amazon 1 E-comm 1 E-comm
Facebook 1 Social Media 2 Social Media
Google 3 Search, Android, Search 2 Search, Android,
Microsoft 2 OS, X-box, Search 3 OS, X-box,Search
答案 0 :(得分:1)
import pandas as pd
def remove_dup(string):
temp=string.split(',')
temp=[x.strip() for x in temp]
return ','.join(set(temp))
compnaies = ['Microsoft', 'Google', 'Amazon', 'Microsoft', 'Facebook', 'Google','Google']
products = ['OS', 'Search', 'E-comm', 'X-box', 'Social Media', 'Android','Search']
df = pd.DataFrame({'company' : compnaies, 'product':products })
new_df=df.groupby('company').product.agg([('Number', 'count'), ('Product list', ', '.join)]).reset_index()
#create uniquevalues
new_df['uniquevalues']=new_df['Product list'].apply(remove_dup)
#create uniquecount
new_df['uniquecount']=new_df['uniquevalues'].str.split(',').str.len()
答案 1 :(得分:1)
这是标准答案:
df.groupby('company').product.agg([('count', 'count'), ('product', ', '.join)]).rename(columns={"count":"number","product":"product lists"})