我有以下数据框:
x = pd.DataFrame({'user': ['a','a','a','a','b','b'], 'dt': ['2016-01-01','2016-01-02','2016-01-02','2016-01-03', '2016-01-05','2016-01-06'], 'val': [1,33,45,3,2,1]})
user dt val
0 a 2016-01-01 1
1 a 2016-01-02 33
2 a 2016-01-02 45
3 a 2016-01-03 3
4 b 2016-01-05 2
5 b 2016-01-06 1
我想要一个额外的列,该列显示过去2天(在原始数据集的每一行上)每个用户分组的汇总值。所以我想要的输出如下所示:
user dt val sum
0 a 2016-01-01 1 1
1 a 2016-01-02 33 79
2 a 2016-01-02 45 79
3 a 2016-01-03 3 81
4 b 2016-01-05 2 2
5 b 2016-01-06 1 3
我尝试了以下方法,但是没有用。
x['sum'] = x.groupby(['user']).rolling('2d', on='dt')['val'].transform('sum')
即使不使用转换,它也会给我一个错误提示:
Exception: cannot handle a non-unique multi-index!
最好的方法是什么?
答案 0 :(得分:0)
我有一个快速而肮脏的解决方案,至少它适用于您的新旧示例。
###calculate date sum and make it into a dataframe
tmp = x.groupby(['user', "dt"])["val"].sum().to_frame("date_sum")
tmp.reset_index(inplace=True)
#### do the rolling. However, rolling will leave the first date as NaN
a = tmp.groupby("user")["dt", "date_sum"].rolling(2, on='dt')["date_sum"].sum().reset_index()
#### fill the first date NaN with data from tmp
a.loc[(a["user"] == tmp["user"]) & (a["dt"] == tmp["dt"]) & pd.isna(a["date_sum"]), "date_sum"] = tmp["date_sum"]
final = pd.merge(x, a, how="left", on=["user", "dt"])
final
输出:
user dt val date_sum
0 a 2016-01-01 1 1.0
1 a 2016-01-02 33 79.0
2 a 2016-01-02 45 79.0
3 a 2016-01-03 3 81.0
4 b 2016-01-05 2 2.0
5 b 2016-01-06 1 3.0
答案 1 :(得分:0)
您与解决方案非常接近,您必须将class WeatherDetailsViewController: UIViewController {
@IBOutlet private var imageView: UIImageView!
@IBOutlet private var cityLabel: UILabel!
@IBOutlet private var dateLabel: UILabel!
@IBOutlet private var descriptionLabel: UILabel!
@IBOutlet private var temperatureLabel: UILabel!
var viewModel: Weather?
override func viewDidLoad() {
super.viewDidLoad()
view.backgroundColor = .red
imageView.image = UIImage(named: "weather-image")
imageView.contentMode = .scaleAspectFill
imageView.clipsToBounds = true
dateLabel.text = viewModel?.date.toString()
}
override func viewWillAppear(_ animated: Bool) {
super.viewWillAppear(animated)
navigationItem.largeTitleDisplayMode = .never
}
}
转换为dt
,并且还必须在分配为列的同时访问datetime
,因为numpy array
将在这种情况下,请制作GroupBy.rolling
。
这样索引就不能对齐:
MultiIndex
x['dt'] = pd.to_datetime(x['dt'])
x['sum'] = x.groupby('user').rolling('2d', on='dt')['val'].sum().to_numpy()
通知:第1行,第2行的区别在于这是“总和”,因此在第1行中不会是 user dt val sum
0 a 2016-01-01 1 1.00
1 a 2016-01-02 33 34.00
2 a 2016-01-02 45 79.00
3 a 2016-01-03 3 81.00
4 b 2016-01-05 2 2.00
5 b 2016-01-06 1 3.00
。