我有数据
ID,"address","used_at","active_seconds","pageviews"
0a1d796327284ebb443f71d85cb37db9,"vk.com",2016-01-29 22:10:52,3804,115
0a1d796327284ebb443f71d85cb37db9,"2gis.ru",2016-01-29 22:48:52,214,24
0a1d796327284ebb443f71d85cb37db9,"yandex.ru",2016-01-29 22:14:30,4,2
0a1d796327284ebb443f71d85cb37db9,"worldoftanks.ru",2016-01-29 22:10:30,41,2
但它太大而且Excel
无法打开它。
我需要将所有时间划分到不同的周,并将结果打印到每id
到每address
。
它应该看起来像
ID vk.com 2gis.ru yandex.ru
0a1d796327284ebb443f71d85cb37db9 23 40 56
465a3fc01a62fd89a8094abdaccdcc99 0 100 45
...
我一直都算数
data = pd.read_csv("desktop-visits-dnp.csv")
group = data.groupby(['ID', 'address']).active_seconds.sum()
但我需要把它分成几周
但python
我没有多少技能,也不知道我能否完成这项任务
答案 0 :(得分:0)
以下代码会根据ID
和week
创建df = pd.DataFrame()
ids = [''.join([random.choice(string.ascii_lowercase + string.digits) for _ in range(16)]) for i in range(10)]
addresses = [''.join([random.choice(string.ascii_lowercase) for _ in range(10)]) for i in range(10)]
df['ID'] = np.random.choice(ids, size=10000)
df['address'] = np.random.choice(addresses, size=10000)
df['active_seconds'] = np.random.randint(0, 100, 10000)
df['used_at'] = pd.date_range(start=datetime(2016, 1, 1, 0, 0, 0), freq='H', periods=10000)
的总和。
首先,生成一些类似于你的样本数据:
used_at
现在将ID
,address
和index
设置为unstack()
至address
后者,将active_seconds
放入df = df.set_index(['used_at', 'ID', 'address']).unstack().loc[:, 'active_seconds'].reset_index('ID')
列中ID
1}}作为价值观。
resample
接下来,每周按ID
,df = df.groupby('ID').resample('W', how='sum').reset_index('ID')
分组,同时汇总每个时间间隔内的所有值,并将ID
重置为列而不是索引:
address
获取每df.head()
address ID afgpxizbum cihchvzttw dguznssmbi irpvqtmuva \
used_at
2016-01-03 06y2myiclyb2s4hr NaN NaN NaN 19.0
2016-01-10 06y2myiclyb2s4hr 57.0 15.0 66.0 NaN
2016-01-17 06y2myiclyb2s4hr 13.0 144.0 152.0 139.0
2016-01-24 06y2myiclyb2s4hr 186.0 112.0 NaN NaN
2016-01-31 06y2myiclyb2s4hr 15.0 68.0 128.0 63.0
address otlkynddwv ptzzhghnfl rgwbuevvez tgvbvfibaf toimlivump \
used_at
2016-01-03 30.0 NaN NaN 50.0 NaN
2016-01-10 59.0 28.0 NaN NaN 214.0
2016-01-17 106.0 26.0 179.0 62.0 69.0
2016-01-24 87.0 10.0 130.0 264.0 7.0
2016-01-31 144.0 NaN 215.0 NaN 208.0
address uwsdzqyudi
used_at
2016-01-03 99.0
2016-01-10 235.0
2016-01-17 128.0
2016-01-24 85.0
2016-01-31 60.0
和group_by
的每周使用次数:
for week, data in df.groupby(level=0):
data.to_excel('{}.xlsx'.format(week))
现在你可以在索引中getCrawls(page: number): void {
this._crawlsService.getCrawls(page)
.subscribe(
res:{crawls:ICrawl[],headers:Headers} => { // <------
(...)
}
);
周,迭代结果并保存到索引。
{{1}}