Question

我正在尝试使用日期作为X轴和几个累积计数作为Y来绘制数据。

我有一组项目，例如：

struct foo {
  static std::vector<int> bar;
  // constructor
  foo() {
    bar.push_back(bar.size());
  }
};

// C++ style of initialization of class/static variables
std::vector<int> foo::bar;


int main() {
  do {
    foo x[100]; // this initializes the foo::bar, goes out of scope and the memory
                //  is freed, but the side effects persist
  } while(false);

  std::cout << foo::bar.size() << std::endl; // yeap, 100
  int* myArray=foo::bar.data();
  //  ^
  //  +--- use it before I change my mind and do...
  foo::bar.clear();
  foo y[200];
}

在这个例子中，我希望绘图有两行，X轴有三个条目（date1，date2，date3），user1在date1的Y值为1，在date2，2为2在date3; user2在date1处为0，在date2处为0，在date3处为1。

直接制作图表，我看不出应该用什么来计算这个累积计数。 E.g。

id1 date1 user1
id2 date2 user1
id3 date3 user2

显然创建了一个图表，其中大多数值为0（几个条目具有完全相同的日期）。

理想地，

Chart(data).mark_line().encode(x='date:T', y='count(*)', color='username')

可行，但似乎没有等效的in the documentation。

在我的实际案例中，我有几十个用户和几千个条目。

Answer 1

我认为Altair尚未提供累积计数聚合。同时，人们可以在熊猫中进行相应的操作。这是一种这样的方式。我相信可以有更有效的方法。

import pandas as pd
import numpy as np
np.random.seed(0)
user_list = ['user1', 'user2']
df = pd.DataFrame({'date':range(2000, 2010),
                  'username':np.random.choice(user_list, 10)})

这就是df的样子。

    date    username
0   2000    user1
1   2001    user2
2   2002    user2
3   2003    user1
4   2004    user2
5   2005    user2
6   2006    user2
7   2007    user2
8   2008    user2
9   2009    user2

交叉制表

d = pd.crosstab(df.date, columns=df.username).cumsum()
d = d.stack().reset_index()
d = d.rename(columns={0:'CummulativeCount'})

这是d.head()的输出。

date    username    CummulativeCount
0   2000    user1   1
1   2000    user2   0
2   2001    user1   1
3   2001    user2   1
4   2002    user1   1

现在，我们可以使用Altair而无需担心任何聚合。

from altair import Chart
c = Chart(d)
c.mark_line().encode(x='date:T', y='CummulativeCount:Q', color='username')

与altair的累积计数

1 个答案: