Question

我有一个包含以下数据的CSV文件，

Week,rossmann
2004-01-04 - 2004-01-10,8
2004-01-11 - 2004-01-17,10
2004-01-18 - 2004-01-24,9
2004-01-25 - 2004-01-31,11
2004-02-01 - 2004-02-07,9
2004-02-08 - 2004-02-14,8
2004-02-15 - 2004-02-21,10

我想用以下数据创建一个DataFrame：

Day,rossmann
2004-01-04, 8
2004-01-05, 8
...
2004-01-11, 10
...

最简单的方法是什么？

Answer 1

您可以照常解析CSV：

df = pd.read_csv('data', sep=r',')

然后使用Series.str.extract根据正则表达式模式提取部分Week：

df['Day'] = df['Week'].str.extract(r'^(\d{4}-\d{2}-\d{2})')
df = df[['Day', 'rossmann']]
print(df)

产量

          Day  rossmann
0  2004-01-04         8
1  2004-01-11        10
2  2004-01-18         9
3  2004-01-25        11
4  2004-02-01         9
5  2004-02-08         8
6  2004-02-15        10

另一种方法是使用正则表达式分隔符r',| - '解析CSV。这将基于逗号或由空格组成的文字字符串拆分CSV，后跟短划线后跟空格：

df = pd.read_csv('data', sep=r',| - ', skiprows=1, header=None, 
                 names=['Day','rossmann'], usecols=[0,2])

产生与上述相同的结果。

Answer 2

import pandas as pd

# to get the start of the week day
def week_starts(week_dates):
    w_start = str(week_dates)[:10]
    return w_start

# to get the end of the week day
def week_ends(week_dates):
    w_ends = str(week_dates)[12:]
    return w_ends

# import csv into dataframe
df = pd.DataFrame.from_csv('d.csv',index_col=False,parse_dates=False)

# create 2 new columns for start of the week day and end of the day week
df['w_start'] = df['Week'].apply(week_starts)
df['w_end'] = df['Week'].apply(week_ends)

# create empty dataframe
df2 = pd.DataFrame(columns=['Days','Rossmann'])


# append day and rossmann into new dataframe
for i in range(len(df)):
    ross= df.iloc[i]['rossmann']
    j = pd.date_range(str(df.iloc[i]['w_start']),str(df.iloc[i]['w_end']),freq='D')
    temp = pd.DataFrame({'Days':j,'Rossmann':ross})
    df2= df2.append(temp,ignore_index=True)

print(df2)

熊猫日期范围为单日期

2 个答案: