Question

我为4种不同的预测模型创建4种不同的数据框。所有数据框的前三列相同，但最后一列不同。我想合并4，以保留前三列，并将每个数据框的最后一列添加到新列中。

每个数据框如下所示：

index   features    ground_true predictionSVM
0      (0, 70)            1                  0
1      (0, 149)           1                  0
2      (0, 265)           1                  1
3      (0, 3001)          1                  0


index   features    ground_true predictionNB
0      (0, 70)            1                  0
1      (0, 149)           1                  1
2      (0, 265)           1                  1
3      (0, 3001)          1                  0


index   features    ground_true predictionLR
0      (0, 70)            1                  1
1      (0, 149)           1                  0
2      (0, 265)           1                  1
3      (0, 3001)          1                  0

index   features    ground_true predictionRF
0      (0, 70)            1                  0
1      (0, 149)           1                  1
2      (0, 265)           1                  0
3      (0, 3001)          1                  0

仅最后一列，我想合并到一个新的数据框中，但是将每个数据框的列预测添加到具有相同的三列

输出

index features  ground_true predictionSVM predictionNB predictionLR prediction RF
0      (0, 70)            1                  0              1            0
1      (0, 149)           1                  0              0            1
2      (0, 265)           1                  1              1            0
3      (0, 3001)          1                  0              0            0

我尝试过


common = ['index', 'features', 'ground_true']
dfs = (df1, df2, df3, df4)

df0 = pd.concat([df.set_index(common) for df in dfs], axis=1).reset_index()

我的代码

这是我创建4个文件的方式：

df1 = pd.concat([df1, pd.DataFrame(data={"index": test_index, "features": X_test, "ground_true": y_test, "predictionSVM": result1})])
    df2 = pd.concat([df2, pd.DataFrame(data={"index": test_index, "features": X_test, "ground_true": y_test, "predictionNB": result2})])
    df3 = pd.concat([df3, pd.DataFrame(data={"index": test_index, "features": X_test, "ground_true": y_test, "predictionLR": result3})])
    df4 = pd.concat([df4, pd.DataFrame(data={"index": test_index, "features": X_test, "ground_true": y_test, "predictionRF": result4})])

common = ['index', 'features', 'ground_true']
dfs = [df.set_index(common) for df in (df1, df2, df3, df4)]

此处错误：

Traceback (most recent call last):
  File "C:\Users\Emmanuelle\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\core\arrays\categorical.py", line 345, in __init__
    codes, categories = factorize(values, sort=True)
  File "C:\Users\Emmanuelle\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\util\_decorators.py", line 178, in wrapper
    return func(*args, **kwargs)
  File "C:\Users\Emmanuelle\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\core\algorithms.py", line 630, in factorize
    na_value=na_value)
  File "C:\Users\Emmanuelle\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\core\algorithms.py", line 476, in _factorize_array
    na_value=na_value)
  File "pandas\_libs\hashtable_class_helper.pxi", line 1601, in pandas._libs.hashtable.PyObjectHashTable.get_labels
TypeError: unhashable type: 'csr_matrix'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "verbatim_ameliore2.py", line 169, in <module>
    dfs = [df.set_index(common) for df in (df1, df2, df3, df4)]
  File "verbatim_ameliore2.py", line 169, in <listcomp>
    dfs = [df.set_index(common) for df in (df1, df2, df3, df4)]
  File "C:\Users\Emmanuelle\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\core\frame.py", line 3915, in set_index
    index = _ensure_index_from_sequences(arrays, names)
  File "C:\Users\Emmanuelle\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\core\indexes\base.py", line 4911, in _ensure_index_from_sequences
    return MultiIndex.from_arrays(sequences, names=names)
  File "C:\Users\Emmanuelle\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\core\indexes\multi.py", line 1274, in from_arrays
    labels, levels = _factorize_from_iterables(arrays)
  File "C:\Users\Emnuelle\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\core\arrays\categorical.py", line 2543, in _factorize_from_iterables
    return map(list, lzip(*[_factorize_from_iterable(it) for it in iterables]))
  File "C:\Users\Emmanlle\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\core\arrays\categorical.py", line 2543, in <listcomp>
    return map(list, lzip(*[_factorize_from_iterable(it) for it in iterables]))
  File "C:\Users\Emmanle\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\core\arrays\categorical.py", line 2515, in _factorize_from_iterable
    cat = Categorical(values, ordered=True)
  File "C:\Users\Emmanu\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\core\arrays\categorical.py", line 347, in __init__
    codes, categories = factorize(values, sort=False)
  File "C:\Users\Emuelle\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\util\_decorators.py", line 178, in wrapper
    return func(*args, **kwargs)
  File "C:\Users\Emmelle\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\core\algorithms.py", line 630, in factorize
    na_value=na_value)
  File "C:\Users\Emmae\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\core\algorithms.py", line 476, in _factorize_array
    na_value=na_value)
  File "pandas\_libs\hashtable_class_helper.pxi", line 1601, in pandas._libs.hashtable.PyObjectHashTable.get_labels
TypeError: unhashable type: 'csr_matrix'

合并多个具有共同多列的DataFrame时出错（避免在结果中重复列）

0 个答案: