我创建了一个函数,用于查找具有一组相同列值的行。但它很难看:嵌套的应用函数。有没有更好的方法来测试n
数字列是否具有相同的值,如果是,则将函数应用于它们或将它们添加到字典中?
#function to add valid polls to dictionary
def j(x,i,cand_name):
if (x['_pollname'] in valid_polls):
if (x['_poll'] in valid_elections):
if(np.isfinite(x[cand_name])):
toappenddf = {}
toappenddf['candidate'] = str(cand_name)
toappenddf['date'] = x['_date']
toappenddf['poll'] = x['_pollname']
toappenddf['polled'] = x[cand_name]
appendc.append(toappenddf)
invalid_cols = ['_poll','_pollname','_date']
# run the function
for val in enumerate(list(df3.columns.values)):
if (val[1] not in invalid_cols):
df3.apply(lambda row: j(row,i,val[1]), axis=1)
#create dataframe from results
df0 = pd.DataFrame(appendc)
pollaggregator = []
#create nested for loops
def k(x):
def h(w,i,xx):
# see if there's a match along three column values between any two rows
if ((w.candidate == xx.candidate) & (w.date == xx.date) & (xx.poll != w.poll)):
appendpoll = {}
appendpoll = w.tolist()
pollaggregator.append(appendpoll)
df0.apply(lambda row: h(row,i,x), axis=1)
if (len(pollaggregator) > 0):
appendpoll2 = {}
appendpoll2 = x.tolist()
df0.apply(k, axis=1)
输入数据在数据框中轮询数据,如下所示:
,candidate,date,poll,polled
0,Biden,2014-12-21 00:00:00,CNN,8.0
1,Biden,2015-01-27 00:00:00,FOX News,17.0
2,Biden,2015-02-15 00:00:00,CNN,14.0
3,Biden,2015-03-02 00:00:00,Quinnipiac,10.0
4,Biden,2015-03-13 00:00:00,CNN/ORC,15.0
5,Biden,2015-03-31 00:00:00,FOX News,12.0
6,Biden,2015-04-24 00:00:00,FOX News,9.0
7,Biden,2015-04-20 00:00:00,CNN/Opinion Research,11.0
8,Biden,2015-04-23 00:00:00,Quinnipiac,10.0
9,Biden,2015-04-24 00:00:00,FOX News,9.0
就我的目的而言,我希望聚合具有相同日期值但不同的民意调查名称值的行。