如何计算python3数据帧中的列频率

时间:2018-12-19 18:16:52

标签: python python-3.x pandas

大家好,我有一个数据框,其中的列像这样。 列:

  • WhiteRating(int)
  • BlackRating(int)
  • NewGameNinePtLead(str,确定位置是否为“ missedMate”,“ lostBigLead”,“无用”)
  • AverageRating
  • Rating_Group: X 分组评分
  • length_of_checkmate(将要移动的整数,整数):这是我的 y

在此数据框内,每一行都是具有所有这些列属性的单个观察值。 我的任务是计算变量 P ,然后对x回归p,对y回归p,最后对(x和y)回归p P =(值为y且丢失的移动次数)/(值为y的移动总数)

我的问题是为我的小组找到 P 。我不确定如何以pythonic方式处理此问题,我可以手动循环并计算所有计数,但即使那样,我仍不确定如何处理它,并且由于数据帧的大小,这可能需要很长的时间

WhiteR,BlackR,EMV,MovePlayedValue,NewGame,NinePtLead,AverageRating,Rating_Group,length_of_checkmate
1880.0,1865.0,27.0,27.0,1,useless,1875,1800,0
1880.0,1865.0,22.0,21.0,1,useless,1875,1800,0
1865.0,1880.0,25.0,25.0,1,useless,1875,1800,0
1880.0,1865.0,24.0,19.0,1,useless,1875,1800,0
1865.0,1880.0,22.0,22.0,1,useless,1875,1800,0
1880.0,1865.0,27.0,27.0,1,bigLeadLost,1875,1800,2

1 个答案:

答案 0 :(得分:0)

如果我正确理解了您的问题:您希望导致输的y类型的频率(非零类型),除以y的总移动量(y类型):

import pandas as pd
import numpy as np

df = {'WhiteR': [1880.0,1880.0,1865.0,1880.0,1865.0,1880.0],\
  'BlackR': [1865.0,1865.0,1880.0,1865.0,1880.0,1865.0],\
  'EMV': [27.0,22.0,25.0,24.0,22.0,27.0,],\
  'MovePlayedValue':[27.0,21.0,25.0,19.0,22.0,27.0,],\
  'NewGame':[1,1,1,1,1,1],\
  'NinePtLead':['useless','useless','useless','useless','useless','bigLeadLost'],\
  'AverageRating':[1875,1875,1875,1875,1875,1875],\
  'Rating_Group':[1800,1800,1800,1800,1800,1800,],\
  'length_of_checkmate':[0,0,0,0,0,2]}


df = pd.DataFrame(df)
status=df['length_of_checkmate'].value_counts().reset_index().rename(columns={'index': 
'length_of_checkmate', 'length_of_checkmate': 'Freq.'})

df1 = pd.merge(df, status, on = ('length_of_checkmate'))
df1['P']= (df1['Freq.']/df1['length_of_checkmate']).replace(np.inf, 0)

#then proceed to 'Regress p against x, regress p against y and finally p against (x and y)'