使用Pandas查找具有选定重复列值的行并聚合它们的最佳方法

时间:2015-07-01 21:06:09

标签: python pandas

我创建了一个函数,用于查找具有一组相同列值的行。但它很难看:嵌套的应用函数。有没有更好的方法来测试n数字列是否具有相同的值,如果是,则将函数应用于它们或将它们添加到字典中?

#function to add valid polls to dictionary
def j(x,i,cand_name):
        if (x['_pollname'] in valid_polls):
            if (x['_poll'] in valid_elections):
                if(np.isfinite(x[cand_name])):
                    toappenddf = {}
                    toappenddf['candidate'] = str(cand_name)
                    toappenddf['date'] = x['_date']
                    toappenddf['poll'] = x['_pollname']
                    toappenddf['polled'] = x[cand_name]
                    appendc.append(toappenddf)

invalid_cols = ['_poll','_pollname','_date']

# run the function
for val in enumerate(list(df3.columns.values)):
    if (val[1] not in invalid_cols):
        df3.apply(lambda row: j(row,i,val[1]), axis=1)

#create dataframe from results
df0 = pd.DataFrame(appendc)
pollaggregator = []

#create nested for loops
def k(x):
    def h(w,i,xx):
        # see if there's a match along three column values between any two rows
        if ((w.candidate == xx.candidate) & (w.date == xx.date) & (xx.poll != w.poll)):
            appendpoll = {}
            appendpoll = w.tolist()
            pollaggregator.append(appendpoll)
    df0.apply(lambda row: h(row,i,x), axis=1)

    if (len(pollaggregator) > 0):
        appendpoll2 = {}
        appendpoll2 = x.tolist()

df0.apply(k, axis=1)

输入数据在数据框中轮询数据,如下所示:

,candidate,date,poll,polled
0,Biden,2014-12-21 00:00:00,CNN,8.0
1,Biden,2015-01-27 00:00:00,FOX News,17.0
2,Biden,2015-02-15 00:00:00,CNN,14.0
3,Biden,2015-03-02 00:00:00,Quinnipiac,10.0
4,Biden,2015-03-13 00:00:00,CNN/ORC,15.0
5,Biden,2015-03-31 00:00:00,FOX News,12.0
6,Biden,2015-04-24 00:00:00,FOX News,9.0
7,Biden,2015-04-20 00:00:00,CNN/Opinion Research,11.0
8,Biden,2015-04-23 00:00:00,Quinnipiac,10.0
9,Biden,2015-04-24 00:00:00,FOX News,9.0

就我的目的而言,我希望聚合具有相同日期值但不同的民意调查名称值的行。

0 个答案:

没有答案