这本质上与Merge values of a dataframe where other columns match有关,但由于这个问题已经回答了,我没有找出不同问题的正确修改,我打开了这个新线程。希望没关系。对问题。我有以下数据
date car_brand color city stolen
"2020-01-01" porsche red paris False
"2020-01-01" porsche red london False
"2020-01-01" porsche red munich False
"2020-01-01" porsche red madrid False
"2020-01-01" porsche red rome False
"2020-01-01" porsche blue berlin False
"2020-01-01" porsche blue tokyo False
"2020-01-01" porsche blue peking False
"2020-01-01" porsche white liverpool False
"2020-01-01" porsche white oslo False
"2020-01-01" porsche white barcelona False
"2020-01-01" porsche white miami False
"2020-01-02" porsche red paris False
"2020-01-02" porsche red london False
"2020-01-02" porsche red munich False
"2020-01-02" porsche red madrid False
"2020-01-02" porsche red rome False
"2020-01-02" porsche blue berlin False
"2020-01-02" porsche blue tokyo False
"2020-01-02" porsche blue peking False
"2020-01-02" porsche white liverpool False
"2020-01-02" porsche white oslo False
"2020-01-02" porsche white barcelona False
"2020-01-02" porsche white miami False
"2020-01-03" porsche red paris False
"2020-01-03" porsche red london False
"2020-01-03" porsche red munich False
"2020-01-03" porsche red madrid True
"2020-01-03" porsche red rome False
"2020-01-03" porsche blue berlin False
"2020-01-03" porsche blue tokyo False
"2020-01-03" porsche blue peking False
"2020-01-03" porsche white liverpool False
"2020-01-03" porsche white oslo False
"2020-01-03" porsche white barcelona False
"2020-01-03" porsche white miami False
"2020-01-04" porsche red paris False
"2020-01-04" porsche red london False
"2020-01-04" porsche red munich False
"2020-01-04" porsche red madrid False
"2020-01-04" porsche red rome False
"2020-01-04" porsche blue berlin False
"2020-01-04" porsche blue tokyo False
"2020-01-04" porsche blue peking False
"2020-01-04" porsche white liverpool False
"2020-01-04" porsche white oslo False
"2020-01-04" porsche white barcelona False
"2020-01-04" porsche white miami False
我知道根据以下方式创建数据框的内容:如果连续几天布尔值“被盗”与所有条目匹配,那么我想合并日期列。例如,在上面的示例中,布尔条目匹配“2020-01-01”和“2020-01-02”。所以总的来说,我想得到以下结果:
date car_brand color city stolen
["2020-01-01","2020-01-02"] porsche red paris False
["2020-01-01","2020-01-02"] porsche red london False
["2020-01-01","2020-01-02"] porsche red munich False
["2020-01-01","2020-01-02"] porsche red madrid False
["2020-01-01","2020-01-02"] porsche red rome False
["2020-01-01","2020-01-02"] porsche blue berlin False
["2020-01-01","2020-01-02"] porsche blue tokyo False
["2020-01-01","2020-01-02"] porsche blue peking False
["2020-01-01","2020-01-02"] porsche white liverpool False
["2020-01-01","2020-01-02"] porsche white oslo False
["2020-01-01","2020-01-02"] porsche white barcelona False
["2020-01-01","2020-01-02"] porsche white miami False
["2020-01-03"] porsche red paris False
["2020-01-03"] porsche red london False
["2020-01-03"] porsche red munich False
["2020-01-03"] porsche red madrid True
["2020-01-03"] porsche red rome False
["2020-01-03"] porsche blue berlin False
["2020-01-03"] porsche blue tokyo False
["2020-01-03"] porsche blue peking False
["2020-01-03"] porsche white liverpool False
["2020-01-03"] porsche white oslo False
["2020-01-03"] porsche white barcelona False
["2020-01-03"] porsche white miami False
["2020-01-04"] porsche red paris False
["2020-01-04"] porsche red london False
["2020-01-04"] porsche red munich False
["2020-01-04"] porsche red madrid False
["2020-01-04"] porsche red rome False
["2020-01-04"] porsche blue berlin False
["2020-01-04"] porsche blue tokyo False
["2020-01-04"] porsche blue peking False
["2020-01-04"] porsche white liverpool False
["2020-01-04"] porsche white oslo False
["2020-01-04"] porsche white barcelona False
["2020-01-04"] porsche white miami False
答案 0 :(得分:1)
简而言之,代码没有从示例数据构建数据框。
关键技术是在日期 被盗更改的新列。 increment on value change
df["date"] = pd.to_datetime(df["date"])
# require new group when there is a stolen car in any date
df2 = (df.groupby("date")["stolen"].max().to_frame()
.reset_index()
.assign(stolen_grp=lambda dfa: (dfa.stolen.diff() != 0).cumsum())
.drop(columns="stolen")
)
# put stolen_grp back into dataframe
df = df.merge(df2, on="date")
# same technique, breaking on days a car has been stolen
(
df
.groupby([c for c in df.columns if c!="date"])["date"]
# only include if first date or if it's a consequetive date
.agg(lambda x: [xx for i,xx in enumerate(x) if i==0 or xx==(list(x)[i-1]+pd.DateOffset(1))])
.reset_index()
.drop(columns="stolen_grp")
)
car_brand color city stolen date
porsche blue berlin False [2020-01-01 00:00:00, 2020-01-02 00:00:00]
porsche blue berlin False [2020-01-03 00:00:00]
porsche blue berlin False [2020-01-04 00:00:00]
porsche blue peking False [2020-01-01 00:00:00, 2020-01-02 00:00:00]
porsche blue peking False [2020-01-03 00:00:00]
porsche blue peking False [2020-01-04 00:00:00]
porsche blue tokyo False [2020-01-01 00:00:00, 2020-01-02 00:00:00]
porsche blue tokyo False [2020-01-03 00:00:00]
porsche blue tokyo False [2020-01-04 00:00:00]
porsche red london False [2020-01-01 00:00:00, 2020-01-02 00:00:00]