重塑数据框架并根据其他24列修改列

时间:2017-07-19 08:20:20

标签: python pandas pivot-table

我有一个有26列的excel。

DateUnique IDH01H02H03 ... H24

此处H {n}表示小时,即some_code处的UID 19/7/2017 01.00.00,其值为199。在19/7/2017 02.00.00,值为7等。

+--------------------+---------------+----------+---------------+
|       Date         | UID           | H01      | H02           |
+--------------------+---------------+----------+---------------+
| 19/7/2017 00.00.00 | some_code     |      199 |             7 |
| 19/7/2017 00.00.00 | another_code  |      164 |            18 |
| 19/7/2017 00.00.00 | new_code      |      209 |             1 |
| 19/7/2017 00.00.00 | code_5        |       85 |             4 |
| 19/7/2017 00.00.00 | what          |       45 |             6 |

我正在阅读excel并创建一个类似于上面的DataFrame。

我想要修改此DataFrame,以便我得到以下内容。

+--------------------+---------------+----------+
|       Date         | UID           | Value    |
+--------------------+---------------+----------+
| 19/7/2017 01.00.00 | some_code     |      199 |
| 19/7/2017 02.00.00 | some_code     |        7 |
| 19/7/2017 03.00.00 | some_code     |      ... |
.................................................
.................................................
| 19/7/2017 00.00.00 | some_code     |      ... |
| 19/7/2017 01.00.00 | another_code  |      164 |
| 19/7/2017 02.00.00 | another_code  |       18 |
| 19/7/2017 03.00.00 | another_code  |       ...|
.................................................
.................................................
| 19/7/2017 00.00.00 | another_code  |       ...|

我是Python和Pandas的新手,无法理解堆栈/取消堆栈/枢轴。

1 个答案:

答案 0 :(得分:1)

您可以使用:

df['Date'] = pd.to_datetime(df['Date'], format='%d/%m/%Y %H.%M.%S')
df = df.set_index(['Date','UID'])
df.columns=pd.to_timedelta(df.columns.str.extract('(\d+)',expand=False).astype(int),unit='H')
df = df.stack().reset_index(name='Value')
df['Date'] = df['Date'] + df['level_2']
df = df.drop('level_2', axis=1)
print (df)
                 Date           UID  Value
0 2017-07-19 01:00:00     some_code    199
1 2017-07-19 02:00:00     some_code      7
2 2017-07-19 01:00:00  another_code    164
3 2017-07-19 02:00:00  another_code     18
4 2017-07-19 01:00:00      new_code    209
5 2017-07-19 02:00:00      new_code      1
6 2017-07-19 01:00:00        code_5     85
7 2017-07-19 02:00:00        code_5      4
8 2017-07-19 01:00:00          what     45
9 2017-07-19 02:00:00          what      6

对于相同格式的日期,请添加dt.strftime

...
df['Date'] = (df['Date'] + df['level_2']).dt.strftime('%d/%m/%Y %H.%M.%S')
df = df.drop('level_2', axis=1)
print (df)
                  Date           UID  Value
0  19/07/2017 01.00.00     some_code    199
1  19/07/2017 02.00.00     some_code      7
2  19/07/2017 01.00.00  another_code    164
3  19/07/2017 02.00.00  another_code     18
4  19/07/2017 01.00.00      new_code    209
5  19/07/2017 02.00.00      new_code      1
6  19/07/2017 01.00.00        code_5     85
7  19/07/2017 02.00.00        code_5      4
8  19/07/2017 01.00.00          what     45
9  19/07/2017 02.00.00          what      6