我的数据框如下:
id column1 column2
a x l
a x n
a y n
b y l
b y m
当前,我以此生成价值计数
def value_occurences(grouped, column_name):
return (grouped[column_name].value_counts(normalize=False, dropna=False)
.to_frame('count_'+column_name)
.reset_index(level=1))
result = value_occurences(grouped, 'column1')
"""
>>>result
id column1 count_column1
a x 2
a y 1
b y 1
"""
我需要以这种格式计数值出现的次数:
id column1 column2
a 'x:2; y:1' 'l:1; n:2'
b 'y:1' 'l:1; m:1'
如何将结果转换为这种格式?
答案 0 :(得分:0)
您可以先按df
生成df.groupby(['id'])
的组,然后将value_counts
应用于每个组:
import io, pandas as pd
def seqdict(x):
return ', '.join('{}:{}'.format(*i) for i in sorted(x.items()))
def value_occurences(df):
return pd.DataFrame({c: {i: seqdict(d.iloc[:,j].value_counts().to_dict())
for i, d in df.groupby(by=['id']) }
for j, c in enumerate(df.keys())
})
grouped = pd.read_table(io.StringIO("""id column1 column2
a x l
a x n
a y n
b y l
b y m
"""), sep='\s+')
value_occurences(grouped)
结果:
column1 column2
a x:2, y:1 l:1, n:2
b y:2 l:1, m:1
答案 1 :(得分:0)
我知道这并没有使用熊猫,但它可能仍然可以帮助您:
from collections import defaultdict
import pandas as pd
df = pd.DataFrame({'id': ['a', 'a', 'a', 'b', 'b'], 'column1': ['x', 'x', 'y', 'y', 'y'], 'column2': ['l', 'n', 'n', 'l', 'm']})
# id column1 column2
# 0 a x l
# 1 a x n
# 2 a y n
# 3 b y l
# 4 b y m
c1_counter = defaultdict(lambda: defaultdict(int))
c2_counter = defaultdict(lambda: defaultdict(int))
for idx, row in df.iterrows():
c1_counter[row['id']][row['column1']] += 1
c2_counter[row['id']][row['column2']] += 1
new_data = defaultdict(list)
for k, v in c1_counter.items():
new_data['id'].append(k)
c1_items = [f'{v_}:{f}' for v_, f in v.items()]
c2_items = [f'{v_}:{f}' for v_, f in c2_counter[k].items()]
new_data['column1'].append(';'.join(c1_items))
new_data['column2'].append(';'.join(c2_items))
df = pd.DataFrame(new_data)
然后df
看起来像:
id column1 column2
0 a x:2;y:1 l:1;n:2
1 b y:2 l:1;m:1
答案 2 :(得分:0)
您可以使用groupby
两次。首先添加,计算值,然后将它们连接在一起:
dfs = []
for column in ['column1', 'column2']:
df_ = df.groupby(['id'])[column].value_counts()
df_ = df_.index.get_level_values(-1) + ':' + df_.astype(str)
df_ = df_.groupby('id').agg(lambda x: '; '.join(x)).rename(column)
dfs.append(df_)
pd.concat(dfs, axis=1)