我有一个宽数据格式的数据框,其中日期范围和空字符串作为列名,但是第一行具有一些预期的列标题,因此我需要一个代码,从标题中推断出星期,然后从列中选择列名。第一行并将其重命名(即week1_quantity,week1_sales,week1_profit)
import pandas as pd
df = pd.DataFrame([
{'Related Fields':'Description', 'Unnamed 1':'barcode',
'Unnamed 2':'department', 'Unnamed 3':'section',
'Unnamed 4':'reference', 'Sales: (06/07/2020,12/07/2020)':'Quantity',
'Unnamed 6':'amount', 'Unnamed 7':'cost',
'Unnamed 8':'% M/S', 'Unnamed 9': 'profit',
'Sales: (29/06/2020,05/07/2020)': 'Quantity',
'Unnamed 11':'amount', 'Unnamed 12':'cost',
'Unnamed 13':'% M/S', 'Unnamed 14':'profit'},
{'Related Fields':'cornflakes', 'Unnamed 1':'0001198',
'Unnamed 2':'grocery', 'Unnamed 3':'breakefast',
'Unnamed 4': '0001198', 'Sales: (06/07/2020,12/07/2020)': 60,
'Unnamed 6': 6000, 'Unnamed 7':3000, 'Unnamed 8':50,
'Unnamed 9':3000, 'Sales: (29/06/2020,05/07/2020)': 120,
'Unnamed 11':12000, 'Unnamed 12':6000, 'Unnamed 13':50,
'Unnamed 14':6000}
])
预期结果
df2 = pd.DataFrame([
{'Description':'cornflakes', 'barcode':'0001198',
'department':'grocery', 'section':'breakefast',
'reference':'0001198', 'week28_quantity':60,
'week28_amount':6000, 'week28_cost':3000,
'week28_% M/S':50, 'week28_profit':3000,
'week29_quantity':120, 'week29_amount':6000,
'week29_cost':6000, 'week29_% M/S':50,
'week28_profit':6000}
])
我试图手动更改名称,但是想要一个自动解决方案。
答案 0 :(得分:0)
您可以通过使用datetime.strptime
解析日期并使用datetime.isocalendar
来获取星期数来解决问题。
from datetime import datetime
# get week numbers
wknums = [
'week' + str(
datetime.strptime(colname.split()[1][1:11], '%d/%m/%Y')
.isocalendar()[1]
) + '_'
if colname.startswith('Sales')
else None
for colname in df.columns
]
wknums = (
pd.Series(wknums).ffill().fillna('') # forward fill week numbers
+ df.loc[0].to_numpy() # add text from first row
).str.lower() # change to lower case, use it only if it helps
df.columns = wknums # replace df column labels
df = df.iloc[1:].reset_index(drop=True) # drop first row
输出
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1 entries, 0 to 0
Data columns (total 15 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 description 1 non-null object
1 barcode 1 non-null object
2 department 1 non-null object
3 section 1 non-null object
4 reference 1 non-null object
5 week28_quantity 1 non-null object
6 week28_amount 1 non-null object
7 week28_cost 1 non-null object
8 week28_% m/s 1 non-null object
9 week28_profit 1 non-null object
10 week27_quantity 1 non-null object
11 week27_amount 1 non-null object
12 week27_cost 1 non-null object
13 week27_% m/s 1 non-null object
14 week27_profit 1 non-null object
dtypes: object(15)
memory usage: 248.0+ bytes