我有一个包含以下数据的CSV文件,
Week,rossmann
2004-01-04 - 2004-01-10,8
2004-01-11 - 2004-01-17,10
2004-01-18 - 2004-01-24,9
2004-01-25 - 2004-01-31,11
2004-02-01 - 2004-02-07,9
2004-02-08 - 2004-02-14,8
2004-02-15 - 2004-02-21,10
我想用以下数据创建一个DataFrame:
Day,rossmann
2004-01-04, 8
2004-01-05, 8
...
2004-01-11, 10
...
最简单的方法是什么?
答案 0 :(得分:1)
您可以照常解析CSV:
df = pd.read_csv('data', sep=r',')
然后使用Series.str.extract
根据正则表达式模式提取部分Week
:
df['Day'] = df['Week'].str.extract(r'^(\d{4}-\d{2}-\d{2})')
df = df[['Day', 'rossmann']]
print(df)
产量
Day rossmann
0 2004-01-04 8
1 2004-01-11 10
2 2004-01-18 9
3 2004-01-25 11
4 2004-02-01 9
5 2004-02-08 8
6 2004-02-15 10
另一种方法是使用正则表达式分隔符r',| - '
解析CSV。这将基于逗号或由空格组成的文字字符串拆分CSV,后跟短划线后跟空格:
df = pd.read_csv('data', sep=r',| - ', skiprows=1, header=None,
names=['Day','rossmann'], usecols=[0,2])
产生与上述相同的结果。
答案 1 :(得分:0)
import pandas as pd
# to get the start of the week day
def week_starts(week_dates):
w_start = str(week_dates)[:10]
return w_start
# to get the end of the week day
def week_ends(week_dates):
w_ends = str(week_dates)[12:]
return w_ends
# import csv into dataframe
df = pd.DataFrame.from_csv('d.csv',index_col=False,parse_dates=False)
# create 2 new columns for start of the week day and end of the day week
df['w_start'] = df['Week'].apply(week_starts)
df['w_end'] = df['Week'].apply(week_ends)
# create empty dataframe
df2 = pd.DataFrame(columns=['Days','Rossmann'])
# append day and rossmann into new dataframe
for i in range(len(df)):
ross= df.iloc[i]['rossmann']
j = pd.date_range(str(df.iloc[i]['w_start']),str(df.iloc[i]['w_end']),freq='D')
temp = pd.DataFrame({'Days':j,'Rossmann':ross})
df2= df2.append(temp,ignore_index=True)
print(df2)