我一直在努力处理一张纸上有多张桌子的讨厌的excel纸。我读了一遍,数据框看起来像这样:
# (see csv file)
import pandas as pd
df = pd.read_csv('SO_csv.csv')
df
Out[]:
Unnamed: 0 Monday Tuesday Wednesday Thursday Friday Saturday Sunday Total
0 0th Week 2019-01-07 00:00:00 2019-01-08 00:00:00 2019-01-09 00:00:00 2019-01-10 00:00:00 2019-01-11 00:00:00 2019-01-12 00:00:00 2019-01-13 00:00:00 NaN
1 item0 0.510017 0.510017 0.510017 0.510017 0.510017 0.510017 0.510017 3.570116
2 item1 0.510017 0.510017 0.510017 0.510017 0.510017 0.510017 0.510017 3.570116
3 item2 0.510017 0.510017 0.510017 0.510017 0.510017 0.510017 0.510017 3.570116
4 item3 0.510017 0.510017 0.510017 0.510017 0.510017 0.510017 0.510017 3.570116
5 Total 3.57012 3.57012 3.57012 3.57012 3.57012 3.57012 3.57012 NaN
6 NaN NaN NaN NaN NaN NaN NaN NaN NaN
7 NaN Monday Tuesday Wednesday Thursday Friday Saturday Sunday NaN
8 1st Week 2019-01-14 00:00:00 2019-01-15 00:00:00 2019-01-16 00:00:00 2019-01-17 00:00:00 2019-01-18 00:00:00 2019-01-19 00:00:00 2019-01-20 00:00:00 NaN
9 item0 0.510017 0.510017 0.510017 0.510017 0.510017 0.510017 0.510017 3.570116
10 item1 0.510017 0.510017 0.510017 0.510017 0.510017 0.510017 0.510017 3.570116
11 item2 0.510017 0.510017 0.510017 0.510017 0.510017 0.510017 0.510017 3.570116
12 item3 0.510017 0.510017 0.510017 0.510017 0.510017 0.510017 0.510017 3.570116
13 item4 0.510017 0.510017 0.510017 0.510017 0.510017 0.510017 0.510017 3.570116
14 item5 0.510017 0.510017 0.510017 0.510017 0.510017 0.510017 0.510017 3.570116
15 Total 3.57012 3.57012 3.57012 3.57012 3.57012 3.57012 3.57012 NaN
16 NaN NaN NaN NaN NaN NaN NaN NaN NaN
17 NaN Monday Tuesday Wednesday Thursday Friday Saturday Sunday NaN
18 2nd Week 2019-01-21 00:00:00 2019-01-22 00:00:00 2019-01-23 00:00:00 2019-01-24 00:00:00 2019-01-25 00:00:00 2019-01-26 00:00:00 2019-01-27 00:00:00 NaN
19 item0 0.510017 0.510017 0.510017 0.510017 0.510017 0.510017 0.510017 3.570116
20 item1 0.510017 0.510017 0.510017 0.510017 0.510017 0.510017 0.510017 3.570116
21 item2 0.510017 0.510017 0.510017 0.510017 0.510017 0.510017 0.510017 3.570116
22 item6 0.510017 0.510017 0.510017 0.510017 0.510017 0.510017 0.510017 3.570116
23 item3 0.510017 0.510017 0.510017 0.510017 0.510017 0.510017 0.510017 3.570116
24 item4 0.510017 0.510017 0.510017 0.510017 0.510017 0.510017 0.510017 3.570116
25 item5 0.510017 0.510017 0.510017 0.510017 0.510017 0.510017 0.510017 3.570116
26 Total 3.57012 3.57012 3.57012 3.57012 3.57012 3.57012 3.57012 NaN
27 NaN NaN NaN NaN NaN NaN NaN NaN NaN
28 NaN NaN NaN NaN NaN NaN NaN NaN NaN
29 NaN Monday Tuesday Wednesday Thursday Friday Saturday Sunday NaN
30 3rd Week 28/01/2019 29/01/2019 30/01/2019 31/01/2019 2019-01-02 00:00:00 2019-02-02 00:00:00 2019-03-02 00:00:00 NaN
31 item0 0.510017 0.510017 0.510017 0.510017 0.510017 0.510017 0.510017 3.570116
32 item1 0.510017 0.510017 0.510017 0.510017 0.510017 0.510017 0.510017 3.570116
33 item2 0.510017 0.510017 0.510017 0.510017 0.510017 0.510017 0.510017 3.570116
34 item6 0.510017 0.510017 0.510017 0.510017 0.510017 0.510017 0.510017 3.570116
35 item3 0.510017 0.510017 0.510017 0.510017 0.510017 0.510017 0.510017 3.570116
36 item4 0.510017 0.510017 0.510017 0.510017 0.510017 0.510017 0.510017 3.570116
37 item5 0.510017 0.510017 0.510017 0.510017 0.510017 0.510017 0.510017 3.570116
38 Total 3.57012 3.57012 3.57012 3.57012 3.57012 3.57012 3.57012 NaN
39 NaN NaN NaN NaN NaN NaN NaN NaN NaN
注意:代码只是在这里演示输出,而不解释我尝试过的内容。
# THIS IS ALL JUST TO CREATE A DATAFRAME TO SHOW YOU WHAT IT SHOULD LOOK LIKE
items = ['item0', 'item1', 'item2', 'item3', 'item4', 'item5', 'item6', 'item7']
weeks = [1,2,3]
days = list(calendar.day_name)
arrays_len = len(weeks) * len(days) * len(items)
dates = df_.loc[weeks_bool,item_cols].melt()
dates['value'] = pd.to_datetime(dates['value'])
dates = dates.sort_values(by='value')
dates = list(dates['value'].astype(str)) # len 28
# length 168 arrays
items = np.tile(items, int(len_arrays / len(items) ))
weeks = np.repeat(weeks, int(len_arrays / len(weeks)))
days = np.repeat(days, int(len_arrays / len(days)))
vals = np.repeat(0.510017, int(len_arrays))
dates = np.repeat(dates, int(len_arrays / len(dates)))
pd.DataFrame({'weeks': weeks,'days': days, 'dates':dates,'items': items,'value': vals}).head()
Out[]:
weeks days dates items value
0 1 Monday 2019-01-02 item0 0.510017
1 1 Tuesday 2019-01-02 item1 0.510017
2 1 Wednesday 2019-01-02 item2 0.510017
3 1 Thursday 2019-01-02 item3 0.510017
4 1 Friday 2019-01-02 item4 0.510017
有人可以帮我带些熊猫熊猫吗?
第一步,我想将True
值分组为(但不包括)下一个真实值。
weeks_bool = df['Unnamed: 0'].str.contains('Week').fillna(False)