我继续使用pandas.dataframe.append,但是在for循环中的追加继续被覆盖并且信息重复。 for循环的输出应该看起来与原始数据帧完全一样(我不包括消除任何复杂性的函数)。任何帮助,将不胜感激。
import pandas as pd
df = pd.DataFrame({
'date': ['2019-01-01','2019-01-01','2019-01-01',
'2019-02-01','2019-02-01','2019-02-01',
'2019-03-01','2019-03-01','2019-03-01',],
'Asset': ['Asset A', 'Asset A', 'Asset A', 'Asset B', 'Asset B', 'Asset B',
'Asset C', 'Asset C', 'Asset C'],
'Monthly Value': [2100, 8100, 1400, 1400, 3100, 1600, 2400, 2100, 2100]
})
print(df.sort_values(by=['Asset']))
date Asset Monthly Value
0 2019-01-01 Asset A 2100
1 2019-01-01 Asset A 8100
2 2019-01-01 Asset A 1400
3 2019-02-01 Asset B 1400
4 2019-02-01 Asset B 3100
5 2019-02-01 Asset B 1600
6 2019-03-01 Asset C 2400
7 2019-03-01 Asset C 2100
8 2019-03-01 Asset C 2100
此for循环为df创建了多个附件,并复制了行
assetlist = list(df['Asset'].unique())
for asset in assetlist:
df_subset = df[df['Asset'] == asset]
dfcopy = df_subset.copy()
newdf = newdf.append(dfcopy)
print(newdf)
此输出不正确,它看起来应该与原始数据帧完全一样。
date Asset Monthly Value
6 2019-03-01 Asset C 2400
7 2019-03-01 Asset C 2100
8 2019-03-01 Asset C 2100
6 2019-03-01 Asset C 2400
7 2019-03-01 Asset C 2100
8 2019-03-01 Asset C 2100
0 2019-01-01 Asset A 2100
1 2019-01-01 Asset A 8100
2 2019-01-01 Asset A 1400
3 2019-02-01 Asset B 1400
4 2019-02-01 Asset B 3100
5 2019-02-01 Asset B 1600
6 2019-03-01 Asset C 2400
7 2019-03-01 Asset C 2100
8 2019-03-01 Asset C 2100
0 2019-01-01 Asset A 2100
1 2019-01-01 Asset A 8100
2 2019-01-01 Asset A 1400
3 2019-02-01 Asset B 1400
4 2019-02-01 Asset B 3100
5 2019-02-01 Asset B 1600
6 2019-03-01 Asset C 2400
7 2019-03-01 Asset C 2100
8 2019-03-01 Asset C 2100
0 2019-01-01 Asset A 2100
1 2019-01-01 Asset A 8100
2 2019-01-01 Asset A 1400
3 2019-02-01 Asset B 1400
4 2019-02-01 Asset B 3100
5 2019-02-01 Asset B 1600
6 2019-03-01 Asset C 2400
7 2019-03-01 Asset C 2100
8 2019-03-01 Asset C 2100
0 2019-01-01 Asset A 2100
1 2019-01-01 Asset A 8100
2 2019-01-01 Asset A 1400
3 2019-02-01 Asset B 1400
4 2019-02-01 Asset B 3100
5 2019-02-01 Asset B 1600
6 2019-03-01 Asset C 2400
7 2019-03-01 Asset C 2100
8 2019-03-01 Asset C 2100
答案 0 :(得分:2)
我想您缺少一行:
assetlist = list(df['Asset'].unique())
newdf = pd.DataFrame() # <-- define it as a data frame
for asset in assetlist:
df_subset = df[df['Asset'] == asset]
dfcopy = df_subset.copy()
newdf = newdf.append(dfcopy)
print(newdf)
date Asset Monthly Value
0 2019-01-01 Asset A 2100
1 2019-01-01 Asset A 8100
2 2019-01-01 Asset A 1400
3 2019-02-01 Asset B 1400
4 2019-02-01 Asset B 3100
5 2019-02-01 Asset B 1600
6 2019-03-01 Asset C 2400
7 2019-03-01 Asset C 2100
8 2019-03-01 Asset C 2100
但是,更简单的方法是:
newdf = pd.concat([df.query("Asset == @asset") for asset in assetlist])