我有一个数据集,其结构为:日期 利润
数据集的示例是:
Date Profit
2013-06-21 14
2013-06-22 19
2013-06-23 11
2013-06-24 13
2013-06-25 6
2013-06-26 22
2013-06-27 22
2013-06-28 3
2013-06-29 5
2013-06-30 10
2013-07-01 17
2013-07-02 14
2013-07-03 9
2013-07-04 7
Sample input
是:
data = [('2013-06-21',14),
('2013-06-22',19),
('2013-06-23',11),
('2013-06-24',13),
('2013-06-25',6),
('2013-06-26',22),
('2013-06-27',22),
('2013-06-28',3),
('2013-06-29',5),
('2013-06-30',10),
('2013-07-01',17),
('2013-07-02',14),
('2013-07-03',9),
('2013-07-04',7)]
现在我想做一个rolling aggregation
并存储聚合。通过滚动聚合我的意思是说第1周(2013-06-21到2013-06-27)我想添加前一个日期的利润并将其与当前日期一起存储。因此,对于2013-06-21
,总和将仅为14
,因为它是一周的第一天,但对于2013-06-22
,它应该是previous date (2013-06-21)
和current date (2013-06-22)
的总和应与当前日期一起存储。这将持续到周末,然后下周它将再次重新开始,没有新周的日期。所以对于第一周,sample output
应该是这样的:
Date Profit
2013-06-21 14
2013-06-22 33 #(14 + 19)
2013-06-23 44 #(33 + 11)
2013-06-24 57 #(44 + 13)
2013-06-25 63 #(57 + 6)
2013-06-26 85 #(63 + 22)
2013-06-27 107 #(85 + 22)
我试着看defaultdict
并做了这个:
def aggregate(data, key, value, func):
measures_dict = collections.defaultdict(list)
for k,v in zip(data[key], data[value]):
measures_dict[k].append(v)
return [(k,func(measures_dict[k])) for k in measures_dict.keys()]
但我没有得到结果,并认为defaultdict
不是正确的方法。我也看了pandas
,但我无法开始这样做。任何人都可以帮我做这个滚动聚合吗?
答案 0 :(得分:4)
看到这个答案: Cumulative sum and percentage on column?
并且: http://pandas.pydata.org/pandas-docs/stable/basics.html#basics-dt-accessors 还有这个: http://pandas.pydata.org/pandas-docs/stable/groupby.html
针对每周累积更新:
df = pd.DataFrame(data)
df.columns = ['Date','Profit']
df['Date'] = pd.to_datetime(df['Date'])
df['weekofyear'] = df['Date'].dt.weekofyear
df.reset_index('Date')
df.sort_index(inplace=True)
df['Weekly_Cum'] = df.groupby('weekofyear').cumsum()
输出:
Date Profit weekofyear Weekly_Cum
0 2013-06-21 14 25 14
1 2013-06-22 19 25 33
2 2013-06-23 11 25 44
3 2013-06-24 13 26 13
4 2013-06-25 6 26 19
5 2013-06-26 22 26 41
6 2013-06-27 22 26 63
7 2013-06-28 3 26 66
8 2013-06-29 5 26 71
9 2013-06-30 10 26 81
10 2013-07-01 17 27 17
11 2013-07-02 14 27 31
12 2013-07-03 9 27 40
13 2013-07-04 7 27 47
答案 1 :(得分:0)
在@ liam-foley答案中仅作一个小修正:
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
import java.util.PriorityQueue;
import java.util.Queue;
public class Dijkstra
{
public static final int INF = 100000;
private static class IPair implements Comparable<IPair>
{
int first;
int second;
IPair(int first, int second)
{
this.first = first;
this.second = second;
}
public int compareTo(IPair that)
{
return this.first - that.first;
}
}
public static int[] dijkstra(List<IPair>[] adj, int source)
{
Queue<IPair> pq = new PriorityQueue<>();
int[] dist = new int[adj.length];
boolean[] visited = new boolean[adj.length];
Arrays.fill(dist, INF);
pq.add(new IPair(0, source));
dist[source] = 0;
while (!pq.isEmpty())
{
int u = pq.poll().second;
if (visited[u])
continue;
System.err.println(u);
visited[u] = true;
for (IPair pair : adj[u])
{
int v = pair.first;
int weight = pair.second;
if (dist[v] > dist[u] + weight)
{
dist[v] = dist[u] + weight;
pq.add(new IPair(dist[v], v));
System.err.println(Arrays.toString(dist));
}
}
}
return dist;
}
private static void addEdge(List<IPair>[] adj, int u, int v, int weight)
{
adj[u].add(new IPair(v, weight));
adj[v].add(new IPair(u, weight));
}
public static void main(String[] args)
{
int V = 9;
List<IPair>[] adj = new ArrayList[V];
Arrays.fill(adj, new ArrayList<IPair>());
addEdge(adj, 0, 1, 4);
addEdge(adj, 0, 7, 8);
addEdge(adj, 1, 2, 8);
addEdge(adj, 1, 7, 11);
addEdge(adj, 2, 3, 7);
addEdge(adj, 2, 8, 2);
addEdge(adj, 2, 5, 4);
addEdge(adj, 3, 4, 9);
addEdge(adj, 3, 5, 14);
addEdge(adj, 4, 5, 10);
addEdge(adj, 5, 6, 2);
addEdge(adj, 6, 7, 1);
addEdge(adj, 6, 8, 6);
addEdge(adj, 7, 8, 7);
int[] dist = dijkstra(adj, 0);
for (int i = 0; i < V; ++i)
System.out.println("Minimum distance for source vertex " + 0 + " to reach vertex " + i + " is " + dist[i]);
}
}
否则,该总和将从该索引的所有年份计算出相同的工作日。