使用预先设置的行名称python将数据帧连接到pandas数据帧中

时间:2016-07-15 10:37:13

标签: python pandas dataframe row concat

我试图重构之前非常手动的代码,并且涉及为我创建的每个新数据框设置索引,以基本上创建所需的输出:

    f1          precision   recall
A   0.600315956 0.72243346  0.513513514
B   0.096692112 0.826086957 0.051351351
C   0.085642317 0.62962963  0.045945946
D   0.108641975 0.628571429 0.059459459

这是我目前的代码:

summaryDF = pd.DataFrame().set_index(['A','B','C','D'])

def evaluation(trueLabels, evalLabels):

    precision = precision_score(trueLabels, evalLabels)
    recall = precision_score(trueLabels, evalLabels)
    f1 = precision_score(trueLabels, evalLabels)
    accuracy = accuracy_score(trueLabels, evalLabels)

    data = {'precision': precision,
               'recall': recall,
               'f1': f1}

    DF = pd.DataFrame(data)

    summaryDF.concat(DF,ignore_index=True)


results = [y_randpred,y_cat_random_to_binary,y_cat_random_to_binary_threshold,y_closed_random_to_binary]

for result in results:
    evaluation(y_true_claim, result)

这是我的错误跟踪:

Traceback (most recent call last):
  File "/Users/dhruv/Documents/bla/bla/src/main/bla.py", line 419, in <module>
    summaryDF = pd.DataFrame().set_index(['A','B','C','D'])
  File "/Users/dhruv/anaconda/lib/python2.7/site-packages/pandas/core/frame.py", line 2607, in set_index
    level = frame[col].values
  File "/Users/dhruv/anaconda/lib/python2.7/site-packages/pandas/core/frame.py", line 1797, in __getitem__
    return self._getitem_column(key)
  File "/Users/dhruv/anaconda/lib/python2.7/site-packages/pandas/core/frame.py", line 1804, in _getitem_column
    return self._get_item_cache(key)
  File "/Users/dhruv/anaconda/lib/python2.7/site-packages/pandas/core/generic.py", line 1084, in _get_item_cache
    values = self._data.get(item)
  File "/Users/dhruv/anaconda/lib/python2.7/site-packages/pandas/core/internals.py", line 2851, in get
    loc = self.items.get_loc(item)
  File "/Users/dhruv/anaconda/lib/python2.7/site-packages/pandas/core/index.py", line 1572, in get_loc
    return self._engine.get_loc(_values_from_object(key))
  File "pandas/index.pyx", line 134, in pandas.index.IndexEngine.get_loc (pandas/index.c:3824)
  File "pandas/index.pyx", line 154, in pandas.index.IndexEngine.get_loc (pandas/index.c:3704)
  File "pandas/hashtable.pyx", line 686, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12280)
  File "pandas/hashtable.pyx", line 694, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12231)
KeyError: 'A'

知道我做错了吗?

1 个答案:

答案 0 :(得分:0)

我解决了我的问题。

使用this answer,我的代码变为:

summaryDF = pd.DataFrame(columns=('precision','recall','f1'))

def evaluation(trueLabels, evalLabels):

    global summaryDF

    precision = precision_score(trueLabels, evalLabels)
    recall = recall_score(trueLabels, evalLabels)
    f1 = f1_score(trueLabels, evalLabels)

    data = {'precision': [precision],
               'recall': [recall],
               'f1': [f1]
            }

    DF = pd.DataFrame(data)

    summaryDF = pd.concat([summaryDF,DF])

results = [y_randpred,
           y_cat_random_to_binary,
           y_cat_random_to_binary_threshold,
           y_closed_random_to_binary,
           y_closedCat_random_to_binary_threshold]

for result in results:
    evaluation(y_true_claim, result)

summaryDF.index=list(['A',
                     'B',
                     'C',
                     'D',
                     'E'])

关键方面是我需要将元素放在方括号中以进行精确,调用和F1,然后通过summaryDF.index而不是set_index方法设置索引。

所以我只是追加然后设置索引而不是我追加数据帧的开始,因为任何启动的数据帧都必须在某种开头有一个索引。