使用Python在Excel中使用数据透视表进行报告

时间:2016-04-21 08:54:36

标签: python excel numpy pandas

我有数据

ID,"address","used_at","active_seconds","pageviews"
0a1d796327284ebb443f71d85cb37db9,"vk.com",2016-01-29 22:10:52,3804,115
0a1d796327284ebb443f71d85cb37db9,"2gis.ru",2016-01-29 22:48:52,214,24
0a1d796327284ebb443f71d85cb37db9,"yandex.ru",2016-01-29 22:14:30,4,2
0a1d796327284ebb443f71d85cb37db9,"worldoftanks.ru",2016-01-29 22:10:30,41,2

但它太大而且Excel无法打开它。 我需要将所有时间划分到不同的周,并将结果打印到每id到每address。 它应该看起来像

ID                                 vk.com              2gis.ru             yandex.ru

0a1d796327284ebb443f71d85cb37db9     23                     40                  56
465a3fc01a62fd89a8094abdaccdcc99      0                     100                 45
...

我一直都算数

data = pd.read_csv("desktop-visits-dnp.csv")
group = data.groupby(['ID', 'address']).active_seconds.sum()

但我需要把它分成几周

python我没有多少技能,也不知道我能否完成这项任务

1 个答案:

答案 0 :(得分:0)

以下代码会根据IDweek创建df = pd.DataFrame() ids = [''.join([random.choice(string.ascii_lowercase + string.digits) for _ in range(16)]) for i in range(10)] addresses = [''.join([random.choice(string.ascii_lowercase) for _ in range(10)]) for i in range(10)] df['ID'] = np.random.choice(ids, size=10000) df['address'] = np.random.choice(addresses, size=10000) df['active_seconds'] = np.random.randint(0, 100, 10000) df['used_at'] = pd.date_range(start=datetime(2016, 1, 1, 0, 0, 0), freq='H', periods=10000) 的总和。

首先,生成一些类似于你的样本数据:

used_at

现在将IDaddressindex设置为unstack()address后者,将active_seconds放入df = df.set_index(['used_at', 'ID', 'address']).unstack().loc[:, 'active_seconds'].reset_index('ID') 列中ID 1}}作为价值观。

resample

接下来,每周按IDdf = df.groupby('ID').resample('W', how='sum').reset_index('ID') 分组,同时汇总每个时间间隔内的所有值,并将ID重置为列而不是索引:

address

获取每df.head() address ID afgpxizbum cihchvzttw dguznssmbi irpvqtmuva \ used_at 2016-01-03 06y2myiclyb2s4hr NaN NaN NaN 19.0 2016-01-10 06y2myiclyb2s4hr 57.0 15.0 66.0 NaN 2016-01-17 06y2myiclyb2s4hr 13.0 144.0 152.0 139.0 2016-01-24 06y2myiclyb2s4hr 186.0 112.0 NaN NaN 2016-01-31 06y2myiclyb2s4hr 15.0 68.0 128.0 63.0 address otlkynddwv ptzzhghnfl rgwbuevvez tgvbvfibaf toimlivump \ used_at 2016-01-03 30.0 NaN NaN 50.0 NaN 2016-01-10 59.0 28.0 NaN NaN 214.0 2016-01-17 106.0 26.0 179.0 62.0 69.0 2016-01-24 87.0 10.0 130.0 264.0 7.0 2016-01-31 144.0 NaN 215.0 NaN 208.0 address uwsdzqyudi used_at 2016-01-03 99.0 2016-01-10 235.0 2016-01-17 128.0 2016-01-24 85.0 2016-01-31 60.0 group_by的每周使用次数:

for week, data in df.groupby(level=0):
    data.to_excel('{}.xlsx'.format(week))

现在你可以在索引中getCrawls(page: number): void { this._crawlsService.getCrawls(page) .subscribe( res:{crawls:ICrawl[],headers:Headers} => { // <------ (...) } ); 周,迭代结果并保存到索引。

{{1}}