考虑这两列df。我想创建一个Apply函数,将“ other_yrs”列列表中的每个项目与“ cur”列中的单个整数进行比较,并在“ other_yrs”列列表中将每个项目的计数保持为大于或等于“ cur”列中的单个值。我无法弄清楚如何使大熊猫能够通过apply做到这一点。我将Apply函数用于其他目的,并且它们运行良好。任何想法将不胜感激。
cur other_yrs
1 11 [11, 11]
2 12 [16, 13, 12, 9, 9, 6, 6, 3, 3, 3, 2, 1, 0]
4 16 [15, 85]
5 17 [17, 17, 16]
6 13 [8, 8]
下面是我用来将值提取到“ other_yrs”列中的函数。我在想我可以以某种方式将列表中的每个连续值与“ cur”列值进行比较并保持计数。我真的只需要存储多少列表项的计数<=“ cur”列中的值。
def col_check(col_string):
cs_yr_lst = []
count = 0
if len(col_string) < 1: #avoids col values of 0 meaning no other cases.
pass
else:
case_lst = col_string.split(", ") #splits the string of cases into a list
for i in case_lst:
cs_yr = int(i[3:5]) #gets the case year from each individual case number
cs_yr_lst.append(cs_yr) #stores those integers in a list and then into a new column using apply
return cs_yr_lst
预期输出为:
cur other_yrs count
1 11 [11, 11] 2
2 12 [16, 13, 12, 9, 9, 6, 6, 3, 3, 3, 2, 1, 0] 11
4 16 [15, 85] 1
5 17 [17, 17, 16] 3
6 13 [8, 8] 2
答案 0 :(得分:3)
在列表理解内使用zip
压缩cur
和other_yrs
列,并在布尔掩码上使用np.sum
:
df['count'] = [np.sum(np.array(b) <= a) for a, b in zip(df['cur'], df['other_yrs'])]
另一个想法:
df['count'] = pd.DataFrame(df['other_yrs'].tolist(), index=df.index).le(df['cur'], axis=0).sum(1)
结果:
cur other_yrs count
1 11 [11, 11] 2
2 12 [16, 13, 12, 9, 9, 6, 6, 3, 3, 3, 2, 1, 0] 11
4 16 [15, 85] 1
5 17 [17, 17, 16] 3
6 13 [8, 8] 2
答案 1 :(得分:2)
您可以考虑explode
并进行比较,然后在级别= 0上分组并求和:
u = df.explode('other_yrs')
df['Count'] = u['cur'].ge(u['other_yrs']).sum(level=0).astype(int)
print(df)
cur other_yrs Count
1 11 [11, 11] 2
2 12 [16, 13, 12, 9, 9, 6, 6, 3, 3, 3, 2, 1, 0] 11
4 16 [15, 85] 1
5 17 [17, 17, 16] 3
6 13 [8, 8] 2