我正在尝试为熊猫中的每个分组分配一个名称。
我有一个数据框和一个名称列表:
df = pd.DataFrame({'a':[1, 1, 2, 2, 3, 4, 5, 6, 7, 7, 8, 9, 10],
'ids':[234, 345, 456, 444, 333, 22, 11, 5, 1, 2, 3, 4, 6]})
names = ['Matt', 'Jeff', 'Steph', 'Shannon']
我想为每个记录以循环方式分配这些名称。因此,我创建了一个辅助函数来延长此列表的长度以匹配长度。
def match_length(list_, length):
return length//len(list_)*list_+list_[:length%len(list_)]
df['owner'] = match_length(names, len(df))
a ids owner
1 234 Matt
1 345 Jeff
2 456 Steph
2 444 Shannon
3 333 Matt
4 22 Jeff
5 11 Steph
6 5 Shannon
7 1 Matt
7 2 Jeff
8 3 Steph
9 4 Shannon
10 6 Matt
我遇到的问题是我想确保将同一个人分配给每个'a'
组。我不希望'Matt'和“ Jeff”都拥有前两个记录,只有Matt应该拥有它们。我已经尝试.groupby()
和.transform()
,.apply()
和.assign()
真不走运。我不确定如何首先处理我的列表。它应该返回。
a ids owner
1 234 Matt
1 345 Matt
2 456 Jeff
2 444 Jeff
3 333 Steph
4 22 Shannon
5 11 Matt
6 5 Jeff
7 1 Steph
7 2 Steph
8 3 Shannon
9 4 Matt
10 6 Jeff
答案 0 :(得分:2)
这是您需要的吗?
(df.groupby('a').ngroup()%4).map(dict(enumerate(names)))
Out[339]:
0 Matt
1 Matt
2 Jeff
3 Jeff
4 Steph
5 Shannon
6 Matt
7 Jeff
8 Steph
9 Steph
10 Shannon
11 Matt
12 Jeff
dtype: object
答案 1 :(得分:0)
您可以遍历所有分组的元素。
df = pd.DataFrame({'a':[1, 1, 2, 2, 3, 4, 5, 6, 7, 7, 8, 9, 10],
'ids':[234, 345, 456, 444, 333, 22, 11, 5, 1, 2, 3, 4, 6]})
grouped_df = df.groupby('a')
x = pd.DataFrame()
r = pd.DataFrame()
names = ['Matt', 'Jeff', 'Steph', 'Shannon']
for key, item in grouped_df:
x = grouped_df.get_group(key).copy()
x['owner'] = names[(key - 1) % len(names)]
r = r.append(x)
print(r)
输出:
a ids owner
1 234 Matt
1 345 Matt
2 456 Jeff
2 444 Jeff
3 333 Steph
4 22 Shannon
5 11 Matt
6 5 Jeff
7 1 Steph
7 2 Steph
8 3 Shannon
9 4 Matt
10 6 Jeff
答案 2 :(得分:0)
如果我了解您的问题:
import pandas as pd
df = pd.DataFrame({'a':[1, 1, 2, 2, 3, 4, 5, 6, 7, 7, 8, 9, 10],
'ids':[234, 345, 456, 444, 333, 22, 11, 5, 1, 2, 3, 4, 6]})
def match_length(list_, length):
return length//len(list_)*list_+list_[:length%len(list_)]
names = ['Matt', 'Jeff', 'Steph', 'Shannon']
dg = df.groupby('a')['ids'].apply(tuple).reset_index()
dg['owner'] = match_length(names, len(dg))
rows = []
_ = dg.apply(lambda row: [rows.append([row['a'], nn, row['owner']])
for nn in row.ids], axis=1)
dg = pd.DataFrame(rows, columns=dg.columns)
print(dg)
结果:
a ids owner
0 1 234 Matt
1 1 345 Matt
2 2 456 Jeff
3 2 444 Jeff
4 3 333 Steph
5 4 22 Shannon
6 5 11 Matt
7 6 5 Jeff
8 7 1 Steph
9 7 2 Steph
10 8 3 Shannon
11 9 4 Matt
12 10 6 Jeff