一段时间以来,我一直在尝试对此进行编码。 这是一个示例数据框:
capacity = 500
s = pd.Series(['School 1','School 2', 'School 3','School 4', 'School 5'])
p = pd.Series(['132', '458', '333', '300', '258'])
d = pd.Series(['1', '2', '3', '4', '5'])
df = pd.DataFrame(np.c_[s,p,d],columns = ['School Name','Population', 'Distance'])
我想做的是制作一个循环,其中循环将不断地从“容量”中减去“人口”,只要它不超过容量即可。它需要检查订单的“距离”。
示例: 由于“学校1”是最近的学校,因此从500中减去132,即368。但是由于“学校2”是第二个最近的学校,但是人口超过368(458> 368),因此它将在此处停止,因此不再继续检查第二所最近的学校是“学校3”。
在此之后,应将学校名称分配到另一列
最终结果将是:
s = pd.Series(['School 1','School 2', 'School 3','School 4', 'School 5'])
p = pd.Series(['132', '458', '333', '300', '258'])
d = pd.Series(['1', '2', '3', '4', '5'])
sn = pd.Series(['School 1', 0, 0 ,0 ,0])
df2 = pd.DataFrame(np.c_[s,p,d,sn],columns = ['School Name','Population', 'Distance','Included'])
从昨天开始尝试进行此操作,除手动之外,仍然不知道如何执行此操作。仍然是Python的初学者。
感谢您的帮助!
答案 0 :(得分:2)
根据您的问题,我假设在容量超出限制之前,您只想要一个学校名称。可以这样实现:
import pandas as pd
import numpy as np
capacity = 500
s = pd.Series(['School 1','School 2', 'School 3','School 4', 'School 5'])
p = pd.Series(['132', '458', '333', '300', '258'])
d = pd.Series(['1', '2', '3', '4', '5'])
df = pd.DataFrame(np.c_[s,p,d],columns = ['School Name','Population', 'Distance'])
# converting population to integer values
p = p.astype('int')
# placeholder to store school name
school_name = None
for idx, val in enumerate(p):
# keep assigning school name until capacity is exceeded
capacity -= val
if capacity < 0:
break
school_name = s[idx]
# add included column
df['included'] = np.where(df['School Name'] == school_name, df['School Name'], 0)
然后,您可以打印df
来查看它是否确实有效:
>>> df1
School Name Population Distance included
0 School 1 132 1 School 1
1 School 2 458 2 0
2 School 3 333 3 0
3 School 4 300 4 0
4 School 5 258 5 0
但是,假设您要保留所有学校,直到超出容量为止,只需修改上述程序即可。.只需替换占位符和如下所示的循环:
school_names = [] # placeholder will be a list now
for idx, val in enumerate(p):
capacity -= val
if capacity < 0:
break
school_names.append(s[idx]) # keep adding schools that do not exceed capacity to the list
# Instead of equality, check if school name is in your list
df['included'] = np.where(df['School Name'].isin(school_names), df['School Name'], 0)
现在,如果您将capacity = 500
和第二个总体更改为p = pd.Series(['132', '128', '333', '300', '258'])
,那么School 1
和School 2
都将包括在内。