加权平均值的df列总和

时间:2019-08-01 15:32:17

标签: python pandas

背景故事:我有一个熊猫数据框scaledData,它只是信息的标准df,如下所示:

                  COL NAME0 COL NAME1  ...    COL NAME3    COL NAME4
0                Alabama     4.099099  ...    2.042345      1.392755
1                 Alaska     1.396396  ...    1.000000      1.000000
2                Arizona     4.189189  ...    2.003257      1.537777
3               Arkansas     2.927928  ...    2.208723      1.007370
4             California     3.378378  ...    1.754930      2.012395
5               Colorado     3.378378  ...    3.282196      2.843435
6            Connecticut     5.000000  ...    1.452587      4.277286
7               Delaware     4.409692  ...    2.134501      1.970434
8   District of Columbia     5.000000  ...    1.000000      1.000000
9                Florida     4.628118  ...    1.806412      2.213038
10               Georgia     4.628118  ...    1.513896      2.748559
11                Hawaii     3.902494  ...    2.891694      3.872309
12                 Idaho     1.090703  ...    2.978469      4.127419
13              Illinois     4.537415  ...    1.242970      1.888353
14               Indiana     4.537415  ...    2.368881      2.307914
15                  Iowa     2.088435  ...    3.298368      3.421122
16                Kansas     2.723356  ...    2.791375      2.160330
17              Kentucky     3.902494  ...    1.692890      4.133744
18             Louisiana     2.451247  ...    1.000000      1.000000
19                 Maine     3.448980  ...    2.535328      5.000000
20              Maryland     5.000000  ...    1.632194      1.046567

我想在此df中创建另一列Total,其结果是将每个状态(COL NAME0)的所有列值相加后除以字典weights的总和。此外,第E列执行相同的总计操作,但仅适用于具有这些特定标记的列。 weights字典的键是df的列名称,值是一个元组,其中包含各列的权重值(以前使用过但与该问题无关)和该列所属的类别。这是我当前的实现:

weights = {'COL NAME1': (2.14, 'E'), 'COL NAME2': (5.14, 'E'), 'COL NAME3': (10, 'G'), 'COL NAME4' : (5, 'E')}

eWeights = { key: value for key, value in weights.items() if value[1] == 'E'}
gWeights = { key: value for key, value in weights.items() if value[1] == 'G'}

#Total should be the result of adding each of the columns per COL NAME0 row 
#and dividing by the sum of the weight values. 

scaledData['Total'] = scaledData.sum(axis = 1, skipna = True)/ sum(list(weights.values())[0])

#Same calculation on only columns marked 'E'

for key in eWeights:
    scaledData['E'] = scaledData['E'] + scaledData[key]
    scaledData['E'] = scaledData['E'] / sum(list(eWeights.values())[0])

不幸的是,以上代码导致以下错误(由在Total中创建scaledData列的行引起):

TypeError: unsupported operand type(s) for +: 'float' and 'str'

我已经简化了scaledDataweights,但是任何解决方案或建议都会对我的实际df有更多的行和列帮助。感谢您的帮助,让我知道是否需要更多信息。

1 个答案:

答案 0 :(得分:0)

您的df似乎存储为float。试试:

for key in eWeights:
    scaledData['E'] = scaledData['E'].astype(float) + scaledData[key].astype(float)

    scaledData['E'] / sum(list(eWeights.values())[0])
    # should this be a print? Are you trying to set any values?