我试图重新格式化CSV以将每个月的列转换为每个记录的一个单独的行(基本上是旋转它),即:
分为:
要做到这一点,我认为最好的方法是:
Jan-17
,Feb-17
等等),然后复制该行。 Date
和Totals
列。 Jan-17
,Feb-17
等...)它用于第一个数据行(即brand1
),但在第一个外部循环完成后,它会断开:
标签[5]不在[index]
中
df['date'] = ''
df['totals'] = 0
months = ['Jan-17', 'Feb-17', 'Mar-17', 'Apr-17', 'May-17']
dropRowIndex = 0
nextDuplicateRowStartIndex = 0
totalRows = df.shape[0]
for i in range(0, totalRows):
print('--------------')
print(df)
for col in df:
if col in months:
# Insert a row above 0th index with 0th row's values
# Duplicate the row at this index for each month
# Then move on to the next "row", which would be the latest index count
df.loc[nextDuplicateRowStartIndex-1] = df.loc[nextDuplicateRowStartIndex].values
df.loc[nextDuplicateRowStartIndex-1, 'date'] = col
df.loc[nextDuplicateRowStartIndex-1, 'totals'] = df.loc[nextDuplicateRowStartIndex-1][col]
df.index = df.index + 1
df = df.sort_index()
dropRowIndex += 1
# Drop duplicated row by index
df.drop(dropRowIndex, inplace=True)
nextDuplicateRowStartIndex = dropRowIndex
# Remove months columns
for col in df:
if col in months:
df = df.drop(col, 1)
终端输出:
-------------- INITIAL DATA FRAME:
brand Jan-17 Feb-17 Mar-17 Apr-17 May-17 date totals
0 brand1 222 333 444 555 666 0
1 brand2 7777 8888 9999 1010 1111 0
2 brand3 12121 13131 14141 15151 16161 0
-------------- DATA FRAME AFTER FIRST OUTER LOOP (ROW) ITERATION:
brand Jan-17 Feb-17 Mar-17 Apr-17 May-17 date totals
0 brand1 222 333 444 555 666 May-17 666
1 brand1 222 333 444 555 666 Apr-17 555
2 brand1 222 333 444 555 666 Mar-17 444
3 brand1 222 333 444 555 666 Feb-17 333
4 brand1 222 333 444 555 666 Jan-17 222
6 brand2 7777 8888 9999 1010 1111 0
7 brand3 12121 13131 14141 15151 16161 0
Traceback (most recent call last):
File "/Users/danielturcotte/Sites/project/env/lib/python3.6/site-packages/pandas/core/indexing.py", line 1506, in _has_valid_type
error()
File "/Users/danielturcotte/Sites/project/env/lib/python3.6/site-packages/pandas/core/indexing.py", line 1501, in error
axis=self.obj._get_axis_name(axis)))
KeyError: 'the label [5] is not in the [index]'
ERROR
KeyError:'标签[5]不在[index]'
我有一个想法是因为我使用.loc[index]
,其中index是一个整数,可能是.loc
doesn't work with integers,但是.iloc[]
。如果我做
df.iloc[nextDuplicateRowStartIndex-1] = df.iloc[nextDuplicateRowStartIndex].values
我收到错误:
ValueError:标签[10]未包含在轴
中
终端输出产生NaN
s:
brand Jan-17 Feb-17 Mar-17 Apr-17 May-17 date totals
0 NaN NaN NaN NaN NaN NaN May-17 NaN
1 NaN NaN NaN NaN NaN NaN Apr-17 NaN
2 NaN NaN NaN NaN NaN NaN Mar-17 NaN
3 NaN NaN NaN NaN NaN NaN Feb-17 NaN
4 NaN NaN NaN NaN NaN NaN Jan-17 NaN
6 brand2 7777.0 8888.0 9999.0 1010.0 1111.0 0.0
7 NaN NaN NaN NaN NaN NaN Apr-17 NaN
虽然我不相信这是问题,因为print(df.iloc[0])
和print(df.loc[0])
会产生相同的结果(即使我用整数访问loc[0]
)
执行melt
:
答案 0 :(得分:2)
您可以使用melt
。它允许您选择多个ID列和值列。在您的情况下,值列是除“品牌”之外的所有内容,因此我们可以忽略该参数。因此,您可以在一行中完成所有工作:
1. this.schedules = this.navigation.lineSelected.schedules;
2. this.schedules.filter (item => {
3. item> this.hourNow && item <this.hourFinish
4.});
打印:
import pandas as pd
df = pd.DataFrame({
'brand': ['brand1', 'brand2', 'brand3'],
'Jan-17': [22, 232, 324],
'Feb-17': [333, 424, 999]
# ...
})
rearranged = pd.melt(df, id_vars=['brand'], var_name='Date',
value_name='Total')
print(rearranged)
答案 1 :(得分:1)
使用asongtoruin的数据和stack
df.set_index('brand').stack().reset_index(name='Total').rename(columns={'level_1':'Date'})
Out[1043]:
brand Date Total
0 brand1 Feb-17 333
1 brand1 Jan-17 22
2 brand2 Feb-17 424
3 brand2 Jan-17 232
4 brand3 Feb-17 999
5 brand3 Jan-17 324