如何基于其他列值在pandas数据框中添加新列, 例如,Id列的值为ID,而Value列的ID值为不同, 需要像在输出中一样进行合并。
df:
# dictionary of lists
data = {'Id':["A", "A", "B", "B","B", "C", "D","E","E", "F", "G","G"],
'Value': ["10$", "2$", "30%", "43%", "12$", "43$", "27$", "40%" ,
"18$",np.nan,np.nan,"89%"]}
df = pd.DataFrame(data)
print(df)
Id Value
0 A 10$
1 A 2$
2 B 30%
3 B 43%
4 B 12$
5 C 43$
6 D 27$
7 E 40%
8 E 18$
9 F nan
10 G nan
11 G 89%
输出:
Id Value
0 A 10$, 2$
1 B 30%,43%,12$
3 C 43$
4 D 27$
5 E 40%,18$
6 F nan
7 G 89%
答案 0 :(得分:3)
<input type="date" value="@Model.DateFromString" asp-for="DateFrom" min="@Model.EarliestDate.ToHtmlInputDate()" max="@Model.LatestDate.ToHtmlInputDate()" onchange="$('form').submit();" class="form-control">
输出:
df.groupby('Id')['Value'].apply(', '.join).reset_index()
print(df)
答案 1 :(得分:1)
按ID列分组,将串联joininng用作聚合, 通过删除新数据框中的重复项,您将获得预期的结果
df2 = df
df2['Value'] = df.groupby(['Id'])['Value'].transform(lambda x: ','.join(x))
df2 = df2.drop_duplicates()
df2
Id Value
0 A S1,S2
2 B S3,S3,S5
5 C S6
6 D S7
7 E S8,S9
答案 2 :(得分:1)
我建议使用DataFrameGroupBy.aggregate
函数:
data = {'Id':["A", "A", "B", "B","B", "C", "D","E","E"],
'Value': ["S1", "S2", "S3", "S3", "S5", "S6", "S7", "S8" ,"S9"]}
df = pd.DataFrame(data)
df = df.groupby(by="Id", as_index=False).agg(
{"Value": lambda s: ", ".join(s[~s.isnull()]) if not all(s.isnull()) else np.nan})
print(df)
Id Value
0 A S1, S2
1 B S3, S3, S5
2 C S6
3 D S7
4 E S8, S9
5 F NaN
6 G H9