如何在python中循环聚合数据?

时间:2015-02-09 20:37:46

标签: python pandas aggregation itertools

我有一个数据集,其结构为:日期 利润

数据集的示例是:

   Date     Profit
2013-06-21   14
2013-06-22   19
2013-06-23   11
2013-06-24   13
2013-06-25   6
2013-06-26   22
2013-06-27   22
2013-06-28   3
2013-06-29   5
2013-06-30   10
2013-07-01   17
2013-07-02   14
2013-07-03   9
2013-07-04   7

Sample input是:

data = [('2013-06-21',14),
    ('2013-06-22',19),
    ('2013-06-23',11),
    ('2013-06-24',13),
    ('2013-06-25',6),
    ('2013-06-26',22),
    ('2013-06-27',22),
    ('2013-06-28',3),
    ('2013-06-29',5),
    ('2013-06-30',10),
    ('2013-07-01',17),
    ('2013-07-02',14),
    ('2013-07-03',9),
    ('2013-07-04',7)]

现在我想做一个rolling aggregation并存储聚合。通过滚动聚合我的意思是说第1周(2013-06-21到2013-06-27)我想添加前一个日期的利润并将其与当前日期一起存储。因此,对于2013-06-21,总和将仅为14,因为它是一周的第一天,但​​对于2013-06-22,它应该是previous date (2013-06-21)current date (2013-06-22)的总和应与当前日期一起存储。这将持续到周末,然后下周它将再次重新开始,没有新周的日期。所以对于第一周,sample output应该是这样的:

 Date     Profit
2013-06-21   14
2013-06-22   33  #(14 + 19)
2013-06-23   44  #(33 + 11)
2013-06-24   57  #(44 + 13) 
2013-06-25   63  #(57 + 6)
2013-06-26   85  #(63 + 22)
2013-06-27   107 #(85 + 22)

我试着看defaultdict并做了这个:

def aggregate(data, key, value, func):
    measures_dict = collections.defaultdict(list)
    for k,v in zip(data[key], data[value]):
        measures_dict[k].append(v)

return [(k,func(measures_dict[k])) for k in measures_dict.keys()] 

但我没有得到结果,并认为defaultdict不是正确的方法。我也看了pandas,但我无法开始这样做。任何人都可以帮我做这个滚动聚合吗?

2 个答案:

答案 0 :(得分:4)

看到这个答案: Cumulative sum and percentage on column?

并且: http://pandas.pydata.org/pandas-docs/stable/basics.html#basics-dt-accessors 还有这个: http://pandas.pydata.org/pandas-docs/stable/groupby.html

针对每周累积更新:

df = pd.DataFrame(data)
df.columns = ['Date','Profit']
df['Date'] = pd.to_datetime(df['Date'])
df['weekofyear'] = df['Date'].dt.weekofyear
df.reset_index('Date')
df.sort_index(inplace=True)
df['Weekly_Cum'] = df.groupby('weekofyear').cumsum()

输出:

         Date  Profit  weekofyear  Weekly_Cum
0  2013-06-21      14          25          14
1  2013-06-22      19          25          33
2  2013-06-23      11          25          44
3  2013-06-24      13          26          13
4  2013-06-25       6          26          19
5  2013-06-26      22          26          41
6  2013-06-27      22          26          63
7  2013-06-28       3          26          66
8  2013-06-29       5          26          71
9  2013-06-30      10          26          81
10 2013-07-01      17          27          17
11 2013-07-02      14          27          31
12 2013-07-03       9          27          40
13 2013-07-04       7          27          47

答案 1 :(得分:0)

在@ liam-foley答案中仅作一个小修正:

import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
import java.util.PriorityQueue;
import java.util.Queue;

public class Dijkstra
{
    public static final int INF = 100000;

    private static class IPair implements Comparable<IPair>
    {
        int first;
        int second;

        IPair(int first, int second)
        {
            this.first = first;
            this.second = second;
        }

        public int compareTo(IPair that)
        {
            return this.first - that.first;
        }
    }

    public static int[] dijkstra(List<IPair>[] adj, int source)
    {
        Queue<IPair> pq = new PriorityQueue<>();
        int[] dist = new int[adj.length];
        boolean[] visited = new boolean[adj.length];

        Arrays.fill(dist, INF);

        pq.add(new IPair(0, source));
        dist[source] = 0;

        while (!pq.isEmpty())
        {
            int u = pq.poll().second;

            if (visited[u])
                continue;

            System.err.println(u);

            visited[u] = true;

            for (IPair pair : adj[u])
            {
                int v = pair.first;
                int weight = pair.second;

                if (dist[v] > dist[u] + weight)
                {
                    dist[v] = dist[u] + weight;
                    pq.add(new IPair(dist[v], v));

                    System.err.println(Arrays.toString(dist));
                }
            }
        }

        return dist;
    }

    private static void addEdge(List<IPair>[] adj, int u, int v, int weight)
    {
        adj[u].add(new IPair(v, weight));
        adj[v].add(new IPair(u, weight));
    }

    public static void main(String[] args)
    {
        int V = 9;

        List<IPair>[] adj = new ArrayList[V];

        Arrays.fill(adj, new ArrayList<IPair>());

        addEdge(adj, 0, 1, 4);
        addEdge(adj, 0, 7, 8);
        addEdge(adj, 1, 2, 8);
        addEdge(adj, 1, 7, 11);
        addEdge(adj, 2, 3, 7);
        addEdge(adj, 2, 8, 2);
        addEdge(adj, 2, 5, 4);
        addEdge(adj, 3, 4, 9);
        addEdge(adj, 3, 5, 14);
        addEdge(adj, 4, 5, 10);
        addEdge(adj, 5, 6, 2);
        addEdge(adj, 6, 7, 1);
        addEdge(adj, 6, 8, 6);
        addEdge(adj, 7, 8, 7);

        int[] dist = dijkstra(adj, 0);

        for (int i = 0; i < V; ++i)
            System.out.println("Minimum distance for source vertex " + 0 + " to reach vertex " + i + " is " + dist[i]);
    }
}

否则,该总和将从该索引的所有年份计算出相同的工作日。