我有一个简单的数据框df,其中包含一列列表lists
。我想基于lists
生成另外一列。
df
如下:
import pandas as pd
lists={1:[[1]],2:[[1,2,3]],3:[[2,9,7,9]],4:[[2,7,3,5]]}
#create test dataframe
df=pd.DataFrame.from_dict(lists,orient='index')
df=df.rename(columns={0:'lists'})
df
lists
1 [1]
2 [1, 2, 3]
3 [2, 9, 7, 9]
4 [2, 7, 3, 5]
我希望df
看起来像这样:
df
Out[9]:
lists rolllists
1 [1] [1]
2 [1, 2, 3] [1, 1, 2, 3]
3 [2, 9, 7, 9] [1, 2, 3, 2, 9, 7, 9]
4 [2, 7, 3, 5] [2, 9, 7, 9, 2, 7, 3, 5]
基本上,我想对滚动的2个列表进行“求和” / append
。注意第1行,因为我只有1个列表1,所以rolllists是该列表。但是在第2行中,我有2个列表要附加。然后对于第三行,追加df[2].lists
和df[3].lists
等。之前我已经做过类似的工作,请参考:Pandas Dataframe, Column of lists, Create column of sets of cumulative lists, and record by record differences。
另外,如果我们可以在上面获得这一部分,那么我想在groupby
中进行操作(因此,下面的示例将是1组,例如,df
可能看起来像这样groupby
):
Group lists rolllists
1 A [1] [1]
2 A [1, 2, 3] [1, 1, 2, 3]
3 A [2, 9, 7, 9] [1, 2, 3, 2, 9, 7, 9]
4 A [2, 7, 3, 5] [2, 9, 7, 9, 2, 7, 3, 5]
5 B [1] [1]
6 B [1, 2, 3] [1, 1, 2, 3]
7 B [2, 9, 7, 9] [1, 2, 3, 2, 9, 7, 9]
8 B [2, 7, 3, 5] [2, 9, 7, 9, 2, 7, 3, 5]
我尝试了df.lists.rolling(2).sum()之类的各种方法,但出现此错误:
TypeError: cannot handle this type -> object
在pandas 0.24.1中为,在pandas 0.22.0中为unfortunatley,该命令不会出错,而是返回与lists
中相同的值。如此看来,较新版本的Pandas无法汇总列表?这是次要的问题。
爱任何帮助!玩得开心!
答案 0 :(得分:3)
您可以从
开始import pandas as pd
mylists={1:[[1]],2:[[1,2,3]],3:[[2,9,7,9]],4:[[2,7,3,5]]}
mydf=pd.DataFrame.from_dict(mylists,orient='index')
mydf=mydf.rename(columns={0:'lists'})
mydf = pd.concat([mydf, mydf], axis=0, ignore_index=True)
mydf['group'] = ['A']*4 + ['B']*4
# initialize your new series
mydf['newseries'] = mydf['lists']
# define the function that appends lists overs rows
def append_row_lists(data):
for i in data.index:
try: data.loc[i+1, 'newseries'] = data.loc[i, 'lists'] + data.loc[i+1, 'lists']
except: pass
return data
# loop over your groups
for gp in mydf.group.unique():
condition = mydf.group == gp
mydf[condition] = append_row_lists(mydf[condition])
输出
lists Group newseries
0 [1] A [1]
1 [1, 2, 3] A [1, 1, 2, 3]
2 [2, 9, 7, 9] A [1, 2, 3, 2, 9, 7, 9]
3 [2, 7, 3, 5] A [2, 9, 7, 9, 2, 7, 3, 5]
4 [1] B [1]
5 [1, 2, 3] B [1, 1, 2, 3]
6 [2, 9, 7, 9] B [1, 2, 3, 2, 9, 7, 9]
7 [2, 7, 3, 5] B [2, 9, 7, 9, 2, 7, 3, 5]
答案 1 :(得分:1)
怎么样?
rolllists = [df.lists[1].copy()]
for row in df.iterrows():
index, values = row
if index > 1: # or > 0 if zero-indexed
rolllists.append(df.loc[index - 1, 'lists'] + values['lists'])
df['rolllists'] = rolllists
或者作为一个稍微扩展一些的功能:
lists={1:[[1]],2:[[1,2,3]],3:[[2,9,7,9]],4:[[2,7,3,5]]}
df=pd.DataFrame.from_dict(lists,orient='index')
df=df.rename(columns={0:'lists'})
def rolling_lists(df, roll_period=2):
new_roll, rolllists = [], [df.lists[1].copy()] * (roll_period - 1)
for row in df.iterrows():
index, values = row
if index > roll_period - 1: # or -2 if zero-indexed
res = []
for i in range(index - roll_period, index):
res.append(df.loc[i + 1, 'lists']) # or i if 0-indexed
rolllists.append(res)
for li in rolllists:
while isinstance(li[0], list):
li = [item for sublist in li for item in sublist] # flatten nested list
new_roll.append(li)
df['rolllists'] = new_roll
return df
也可以轻松扩展到groupby
,只需将其包装在函数中并使用df.apply(rolling_lists)
。您可以提供任意数量的滚动行以用作roll_period
。希望这会有所帮助!