我有一个名为Side的列的数据框,下面的示例中的值为E或W.我想将这两行合并为一行。会发生什么:Parking_Spaces Total_Vehicle_Count列必须是两行的总和,必须删除侧列,行数必须是之前的一半。
有一个简单的方法吗?
Elmntkey Study_Area Sub_Area Side Unitdesc Parking_Category Parking_Spaces Total_Vehicle_Count Dp_Count Construction Event Closure Subarea Label Peak Hour? (Yes or No) Day Time stamp
2014-04-08 08:00:00 24558 12th Ave - Weekday unknown E 12TH AVE BETWEEN E MARION ST AND E SPRING ST Paid Parking 8.0 1.0 0 No No 12th Ave - Weekday No Weekday
2014-04-08 08:00:00 24557 12th Ave - Weekday unknown W 12TH AVE BETWEEN E MARION ST AND E SPRING ST Paid Parking 11.0 6.0 1 No No 12th Ave - Weekday No Weekday
2014-04-08 09:00:00 24557 12th Ave - Weekday unknown W 12TH AVE BETWEEN E MARION ST AND E SPRING ST Paid Parking 11.0 6.0 1 No No 12th Ave - Weekday No Weekday
2014-04-08 09:00:00 24558 12th Ave - Weekday unknown E 12TH AVE BETWEEN E MARION ST AND E SPRING ST Paid Parking 8.0 1.0 0 No No 12th Ave - Weekday No Weekday
2014-04-08 10:00:00 24557 12th Ave - Weekday unknown W 12TH AVE BETWEEN E MARION ST AND E SPRING ST Paid Parking 11.0 10.0 1 No No 12th Ave - Weekday No Weekday
答案 0 :(得分:1)
可以使用df.groupby
完成此操作df.groupby(['Elmntkey','Study_Area','Sub_Area',' Unitdesc','Dp_Count',' Construction',' Event Closure','Subarea Label','Peak Hour? (Yes or No)','Day Time stamp'])[['Parking_Spaces','Total_Vehicle_Count']].sum().reset_index()
输出
Elmntkey Study_Area Sub_Area Unitdesc Dp_Count Construction Event Closure Subarea Label Peak Hour? (Yes or No) Day Time stamp Parking_Spaces Total_Vehicle_Count
0 24557 12th Ave - Weekday unknown 12TH AVE BETWEEN E MARION ST AND E SPRING ST 1 No No 12th Ave - Weekday No Weekday 33.0 22.0
1 24558 12th Ave - Weekday unknown 12TH AVE BETWEEN E MARION ST AND E SPRING ST 0 No No 12th Ave - Weekday No Weekday 16.0 2.0
答案 1 :(得分:0)
根据Shijos的回答,我用以下代码解决了这个问题:
#Getting the information
temp = df['raw'].groupby(['Time_Stamp','Unitdesc',], as_index=False)['Parking_Spaces','Total_Vehicle_Count'].sum()
#setting Time_Stamp as index and sort by the index, to match the target dataframe
temp = temp.set_index('Time_Stamp')
temp.sort_index(inplace=True)
# save the result to the target dataframe
df['droped']['Free_Spots'] = temp['Parking_Spaces']
df['droped']['Used_Spots'] = temp['Total_Vehicle_Count']
Shijo因提供正确答案而受到赞誉。