excel中的pandas整洁数据框,一张纸上有很多表

时间:2019-02-14 00:38:04

标签: python excel pandas tidy

我一直在努力处理一张纸上有多张桌子的讨厌的excel纸。我读了一遍,数据框看起来像这样:

CSV HERE

# (see csv file)

import pandas as pd
df = pd.read_csv('SO_csv.csv')
df

Out[]:
Unnamed: 0               Monday              Tuesday            Wednesday             Thursday               Friday             Saturday               Sunday     Total
0    0th Week  2019-01-07 00:00:00  2019-01-08 00:00:00  2019-01-09 00:00:00  2019-01-10 00:00:00  2019-01-11 00:00:00  2019-01-12 00:00:00  2019-01-13 00:00:00       NaN
1       item0             0.510017             0.510017             0.510017             0.510017             0.510017             0.510017             0.510017  3.570116
2       item1             0.510017             0.510017             0.510017             0.510017             0.510017             0.510017             0.510017  3.570116
3       item2             0.510017             0.510017             0.510017             0.510017             0.510017             0.510017             0.510017  3.570116
4       item3             0.510017             0.510017             0.510017             0.510017             0.510017             0.510017             0.510017  3.570116
5       Total              3.57012              3.57012              3.57012              3.57012              3.57012              3.57012              3.57012       NaN
6         NaN                  NaN                  NaN                  NaN                  NaN                  NaN                  NaN                  NaN       NaN
7         NaN               Monday              Tuesday            Wednesday             Thursday               Friday             Saturday               Sunday       NaN
8    1st Week  2019-01-14 00:00:00  2019-01-15 00:00:00  2019-01-16 00:00:00  2019-01-17 00:00:00  2019-01-18 00:00:00  2019-01-19 00:00:00  2019-01-20 00:00:00       NaN
9       item0             0.510017             0.510017             0.510017             0.510017             0.510017             0.510017             0.510017  3.570116
10      item1             0.510017             0.510017             0.510017             0.510017             0.510017             0.510017             0.510017  3.570116
11      item2             0.510017             0.510017             0.510017             0.510017             0.510017             0.510017             0.510017  3.570116
12      item3             0.510017             0.510017             0.510017             0.510017             0.510017             0.510017             0.510017  3.570116
13      item4             0.510017             0.510017             0.510017             0.510017             0.510017             0.510017             0.510017  3.570116
14      item5             0.510017             0.510017             0.510017             0.510017             0.510017             0.510017             0.510017  3.570116
15      Total              3.57012              3.57012              3.57012              3.57012              3.57012              3.57012              3.57012       NaN
16        NaN                  NaN                  NaN                  NaN                  NaN                  NaN                  NaN                  NaN       NaN
17        NaN               Monday              Tuesday            Wednesday             Thursday               Friday             Saturday               Sunday       NaN
18   2nd Week  2019-01-21 00:00:00  2019-01-22 00:00:00  2019-01-23 00:00:00  2019-01-24 00:00:00  2019-01-25 00:00:00  2019-01-26 00:00:00  2019-01-27 00:00:00       NaN
19      item0             0.510017             0.510017             0.510017             0.510017             0.510017             0.510017             0.510017  3.570116
20      item1             0.510017             0.510017             0.510017             0.510017             0.510017             0.510017             0.510017  3.570116
21      item2             0.510017             0.510017             0.510017             0.510017             0.510017             0.510017             0.510017  3.570116
22      item6             0.510017             0.510017             0.510017             0.510017             0.510017             0.510017             0.510017  3.570116
23      item3             0.510017             0.510017             0.510017             0.510017             0.510017             0.510017             0.510017  3.570116
24      item4             0.510017             0.510017             0.510017             0.510017             0.510017             0.510017             0.510017  3.570116
25      item5             0.510017             0.510017             0.510017             0.510017             0.510017             0.510017             0.510017  3.570116
26      Total              3.57012              3.57012              3.57012              3.57012              3.57012              3.57012              3.57012       NaN
27        NaN                  NaN                  NaN                  NaN                  NaN                  NaN                  NaN                  NaN       NaN
28        NaN                  NaN                  NaN                  NaN                  NaN                  NaN                  NaN                  NaN       NaN
29        NaN               Monday              Tuesday            Wednesday             Thursday               Friday             Saturday               Sunday       NaN
30   3rd Week           28/01/2019           29/01/2019           30/01/2019           31/01/2019  2019-01-02 00:00:00  2019-02-02 00:00:00  2019-03-02 00:00:00       NaN
31      item0             0.510017             0.510017             0.510017             0.510017             0.510017             0.510017             0.510017  3.570116
32      item1             0.510017             0.510017             0.510017             0.510017             0.510017             0.510017             0.510017  3.570116
33      item2             0.510017             0.510017             0.510017             0.510017             0.510017             0.510017             0.510017  3.570116
34      item6             0.510017             0.510017             0.510017             0.510017             0.510017             0.510017             0.510017  3.570116
35      item3             0.510017             0.510017             0.510017             0.510017             0.510017             0.510017             0.510017  3.570116
36      item4             0.510017             0.510017             0.510017             0.510017             0.510017             0.510017             0.510017  3.570116
37      item5             0.510017             0.510017             0.510017             0.510017             0.510017             0.510017             0.510017  3.570116
38      Total              3.57012              3.57012              3.57012              3.57012              3.57012              3.57012              3.57012       NaN
39        NaN                  NaN                  NaN                  NaN                  NaN                  NaN                  NaN                  NaN       NaN

我想创建一个``整理''表,其中每一行是该项目的每日支出(可以是NaN)。

注意:代码只是在这里演示输出,而不解释我尝试过的内容。

# THIS IS ALL JUST TO CREATE A DATAFRAME TO SHOW YOU WHAT IT SHOULD LOOK LIKE
items = ['item0', 'item1', 'item2', 'item3', 'item4', 'item5', 'item6', 'item7']
weeks = [1,2,3]
days = list(calendar.day_name)
arrays_len = len(weeks) * len(days) * len(items)

dates = df_.loc[weeks_bool,item_cols].melt()
dates['value'] = pd.to_datetime(dates['value'])
dates = dates.sort_values(by='value')
dates = list(dates['value'].astype(str)) # len 28

# length 168 arrays
items = np.tile(items, int(len_arrays / len(items) ))
weeks = np.repeat(weeks, int(len_arrays / len(weeks)))
days = np.repeat(days, int(len_arrays / len(days)))
vals = np.repeat(0.510017, int(len_arrays))
dates = np.repeat(dates, int(len_arrays / len(dates)))

pd.DataFrame({'weeks': weeks,'days': days, 'dates':dates,'items': items,'value': vals}).head()

Out[]:
    weeks       days       dates  items     value
0      1     Monday  2019-01-02  item0  0.510017
1      1    Tuesday  2019-01-02  item1  0.510017
2      1  Wednesday  2019-01-02  item2  0.510017
3      1   Thursday  2019-01-02  item3  0.510017
4      1     Friday  2019-01-02  item4  0.510017

有人可以帮我带些熊猫熊猫吗?

到目前为止我的想法/尝试

  • 我能找到每个表的起始索引
  • 但是我遇到的一个关键问题是如何使用该索引选择新表
  • 因为WHITESPACE(所有NAN)行的数量是可变的(1或2)
  • 而且项目数是可变的

第一步,我想将True值分组为(但不包括)下一个真实值。

weeks_bool = df['Unnamed: 0'].str.contains('Week').fillna(False)

0 个答案:

没有答案