当多行具有相同日期时,熊猫滚动和变换

时间:2019-11-29 00:05:00

标签: python pandas pandas-groupby

我有以下数据框:

x = pd.DataFrame({'user': ['a','a','a','a','b','b'], 'dt': ['2016-01-01','2016-01-02','2016-01-02','2016-01-03', '2016-01-05','2016-01-06'], 'val': [1,33,45,3,2,1]})

  user          dt  val
0    a  2016-01-01    1
1    a  2016-01-02   33
2    a  2016-01-02   45
3    a  2016-01-03    3
4    b  2016-01-05    2
5    b  2016-01-06    1

我想要一个额外的列,该列显示过去2天(在原始数据集的每一行上)每个用户分组的汇总值。所以我想要的输出如下所示:

  user          dt  val  sum
0    a  2016-01-01    1  1
1    a  2016-01-02   33  79
2    a  2016-01-02   45  79
3    a  2016-01-03    3  81
4    b  2016-01-05    2  2
5    b  2016-01-06    1  3

我尝试了以下方法,但是没有用。

x['sum'] = x.groupby(['user']).rolling('2d', on='dt')['val'].transform('sum')

即使不使用转换,它也会给我一个错误提示:

Exception: cannot handle a non-unique multi-index!

最好的方法是什么?

2 个答案:

答案 0 :(得分:0)

我有一个快速而肮脏的解决方案,至少它适用于您的新旧示例。

###calculate date sum and make it into a dataframe
tmp = x.groupby(['user', "dt"])["val"].sum().to_frame("date_sum")

tmp.reset_index(inplace=True)

#### do the rolling. However, rolling will leave the first date as NaN
a = tmp.groupby("user")["dt", "date_sum"].rolling(2, on='dt')["date_sum"].sum().reset_index()

#### fill the first date NaN with data from tmp
a.loc[(a["user"] == tmp["user"]) & (a["dt"] == tmp["dt"]) & pd.isna(a["date_sum"]), "date_sum"] = tmp["date_sum"]

final = pd.merge(x, a, how="left", on=["user", "dt"])
final

输出:

    user    dt  val     date_sum
0   a   2016-01-01  1   1.0
1   a   2016-01-02  33  79.0
2   a   2016-01-02  45  79.0
3   a   2016-01-03  3   81.0
4   b   2016-01-05  2   2.0
5   b   2016-01-06  1   3.0

答案 1 :(得分:0)

您与解决方案非常接近,您必须将class WeatherDetailsViewController: UIViewController { @IBOutlet private var imageView: UIImageView! @IBOutlet private var cityLabel: UILabel! @IBOutlet private var dateLabel: UILabel! @IBOutlet private var descriptionLabel: UILabel! @IBOutlet private var temperatureLabel: UILabel! var viewModel: Weather? override func viewDidLoad() { super.viewDidLoad() view.backgroundColor = .red imageView.image = UIImage(named: "weather-image") imageView.contentMode = .scaleAspectFill imageView.clipsToBounds = true dateLabel.text = viewModel?.date.toString() } override func viewWillAppear(_ animated: Bool) { super.viewWillAppear(animated) navigationItem.largeTitleDisplayMode = .never } } 转换为dt,并且还必须在分配为列的同时访问datetime,因为numpy array将在这种情况下,请制作GroupBy.rolling

这样索引就不能对齐:

MultiIndex
x['dt'] = pd.to_datetime(x['dt'])
x['sum'] = x.groupby('user').rolling('2d', on='dt')['val'].sum().to_numpy()

通知:第1行,第2行的区别在于这是“总和”,因此在第1行中不会是 user dt val sum 0 a 2016-01-01 1 1.00 1 a 2016-01-02 33 34.00 2 a 2016-01-02 45 79.00 3 a 2016-01-03 3 81.00 4 b 2016-01-05 2 2.00 5 b 2016-01-06 1 3.00