以下是一个示例数据集:
>>> df1 = pandas.DataFrame({
"Name": ["Alice", "Marie", "Smith", "Mallory", "Bob", "Doe"],
"City": ["Seattle", None, None, "Portland", None, None],
"Age": [24, None, None, 26, None, None],
"Group": [1, 1, 1, 2, 2, 2]})
>>> df1
Age City Group Name
0 24.0 Seattle 1 Alice
1 NaN None 1 Marie
2 NaN None 1 Smith
3 26.0 Portland 2 Mallory
4 NaN None 2 Bob
5 NaN None 2 Doe
我想合并同一组的所有索引的名称列,同时保持城市和年龄想要像:
>>> df1_summarised
Age City Group Name
0 24.0 Seattle 1 Alice Marie Smith
1 26.0 Portland 2 Mallory Bob Doe
我知道在我的起始数据结构中,在给定组的第一个索引之后,这两列(Age,City)将是NaN / None。
我尝试了以下内容:
>>> print(df1.groupby('Group')['Name'].apply(' '.join))
Group
1 Alice Marie Smith
2 Mallory Bob Doe
Name: Name, dtype: object
但我想保留年龄和城市列......
答案 0 :(得分:3)
试试这个:
In [29]: df1.groupby('Group').ffill().groupby(['Group','Age','City']).Name.apply(' '.join)
Out[29]:
Group Age City
1 24.0 Seattle Alice Marie Smith
2 26.0 Portland Mallory Bob Doe
Name: Name, dtype: object
答案 1 :(得分:2)
将dropna
和assign
与groupby
df1.dropna(subset=['Age', 'City']) \
.assign(Name=df1.groupby('Group').Name.apply(' '.join).values)
<强> 定时 强>
根据要求
<强> 更新 强>
使用groupby
和agg
我想到了这一点,感觉更令人满意
df1.groupby('Group').agg(dict(Age='first', City='first', Name=' '.join))
获得准确的输出
df1.groupby('Group').agg(dict(Age='first', City='first', Name=' '.join)) \
.reset_index().reindex_axis(df1.columns, 1)