是否可以转换以日期为索引的CSV文件?

时间:2019-06-02 20:43:25

标签: python pandas dataframe

我目前正在尝试将python3的CSV转换为新格式。 我后来的目标是使用pandas向该文件添加一些信息。 认为“日期是工作日还是周末?”。

但是,要实现这一目标,我必须克服第一个障碍。

我需要从此转换CSV文件:

date,hour,price
2018-10-01,0-1,59.53
2018-10-01,1-2,56.10
2018-10-01,2-3,51.41
2018-10-01,3-4,47.38
2018-10-01,4-5,47.59
2018-10-01,5-6,51.61
2018-10-01,6-7,69.13
2018-10-01,7-8,77.32
2018-10-01,8-9,84.97
2018-10-01,9-10,79.56
2018-10-01,10-11,73.70
2018-10-01,11-12,71.63
2018-10-01,12-13,63.15
2018-10-01,13-14,60.24
2018-10-01,14-15,56.18
2018-10-01,15-16,53.00
2018-10-01,16-17,53.37
2018-10-01,17-18,60.42
2018-10-01,18-19,69.93
2018-10-01,19-20,75.00
2018-10-01,20-21,65.83
2018-10-01,21-22,53.86
2018-10-01,22-23,46.46
2018-10-01,23-24,42.50
2018-10-02,0-1,45.10
2018-10-02,1-2,44.10
2018-10-02,2-3,44.06
2018-10-02,3-4,43.70
2018-10-02,4-5,44.29
2018-10-02,5-6,48.13
2018-10-02,6-7,57.70
2018-10-02,7-8,68.21
2018-10-02,8-9,70.36
2018-10-02,9-10,54.53
2018-10-02,10-11,48.49
2018-10-02,11-12,46.19
2018-10-02,12-13,44.15
2018-10-02,13-14,30.79
2018-10-02,14-15,27.75
2018-10-02,15-16,30.74
2018-10-02,16-17,26.77
2018-10-02,17-18,38.68
2018-10-02,18-19,48.52
2018-10-02,19-20,49.03
2018-10-02,20-21,45.43
2018-10-02,21-22,32.04
2018-10-02,22-23,26.22
2018-10-02,23-24,1.08
2018-10-03,0-1,2.13
2018-10-03,1-2,0.10
...

对此:

date,0-1,1-2,2-3,3-4,4-5,5-6,6-7,7-8,8-9,...,23-24
2018-10-01,59.53,56.10,51.41,47.38,47.59,51.61,69.13,77.32,84.97,...,42.50
2018-10-02,45.10,44.10,44.06,43.70,44.29,....
2018-10.03,2.13,0.10,....
...

我已经对pandas DataFrames进行了很多尝试,但是我无法提出解决方案。

import numpy as np
import pandas as pd

df = pd.read_csv('file.csv')
df

            date   hour  price
0     2018-10-01    0-1  59.53
1     2018-10-01    1-2  56.10
2     2018-10-01    2-3  51.41
3     2018-10-01    3-4  47.38
4     2018-10-01    4-5  47.59
5     2018-10-01    5-6  51.61
6     2018-10-01    6-7  69.13
7     2018-10-01    7-8  77.32
8     2018-10-01    8-9  84.97

DataFrame应该看起来像这样。 但是我无法填充DataFrame。

df = pd.DataFrame(df, index=['date'], columns=['date','0-1','1-2','2-3', '3-4', '4-5', '5-6', '6-7', '7-8', '8-9', '9-10', '10-11', '11-12', '12-13', '13-14', '14-15', '15-16', '16-17', '17-18', '18-19', '19-20', '20-21', '21-22', '22-23', '23-24'])

您将如何解决?

2 个答案:

答案 0 :(得分:1)

您可以使用pandas.DataFrame.unstack()

# pivot the dataframe with hour to the columns
df1 = df.set_index(['date','hour']).unstack(1)

# drop level-0 on columns
df1.columns = [ c[1] for c in df1.columns ]

# sort the column names by numeric order of hours (the number before '-')
df1 = df1.reindex(columns=sorted(df1.columns, key=lambda x: int(x.split('-')[0]))).reset_index()

答案 1 :(得分:0)

如果我理解正确,请尝试使用index_col的{​​{1}}参数,并对文件中的列使用整数标签:

pd.read_csv()

df = pd.read_csv('file.csv', index_col=0) 文档here;不要被数量惊人的关键字参数所困扰,其中一个关键字参数经常可以满足您的需求!

您可能需要将前两列解析为日期,然后根据结果的条件为周末添加一列。请参见read_csvparse_dates关键字参数。