我为4种不同的预测模型创建4种不同的数据框。所有数据框的前三列相同,但最后一列不同。我想合并4,以保留前三列,并将每个数据框的最后一列添加到新列中。
每个数据框如下所示:
index features ground_true predictionSVM
0 (0, 70) 1 0
1 (0, 149) 1 0
2 (0, 265) 1 1
3 (0, 3001) 1 0
index features ground_true predictionNB
0 (0, 70) 1 0
1 (0, 149) 1 1
2 (0, 265) 1 1
3 (0, 3001) 1 0
index features ground_true predictionLR
0 (0, 70) 1 1
1 (0, 149) 1 0
2 (0, 265) 1 1
3 (0, 3001) 1 0
index features ground_true predictionRF
0 (0, 70) 1 0
1 (0, 149) 1 1
2 (0, 265) 1 0
3 (0, 3001) 1 0
仅最后一列,我想合并到一个新的数据框中,但是将每个数据框的列预测添加到具有相同的三列
输出
index features ground_true predictionSVM predictionNB predictionLR prediction RF
0 (0, 70) 1 0 1 0
1 (0, 149) 1 0 0 1
2 (0, 265) 1 1 1 0
3 (0, 3001) 1 0 0 0
我尝试过
common = ['index', 'features', 'ground_true']
dfs = (df1, df2, df3, df4)
df0 = pd.concat([df.set_index(common) for df in dfs], axis=1).reset_index()
我的代码
这是我创建4个文件的方式:
df1 = pd.concat([df1, pd.DataFrame(data={"index": test_index, "features": X_test, "ground_true": y_test, "predictionSVM": result1})])
df2 = pd.concat([df2, pd.DataFrame(data={"index": test_index, "features": X_test, "ground_true": y_test, "predictionNB": result2})])
df3 = pd.concat([df3, pd.DataFrame(data={"index": test_index, "features": X_test, "ground_true": y_test, "predictionLR": result3})])
df4 = pd.concat([df4, pd.DataFrame(data={"index": test_index, "features": X_test, "ground_true": y_test, "predictionRF": result4})])
common = ['index', 'features', 'ground_true']
dfs = [df.set_index(common) for df in (df1, df2, df3, df4)]
此处错误:
Traceback (most recent call last):
File "C:\Users\Emmanuelle\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\core\arrays\categorical.py", line 345, in __init__
codes, categories = factorize(values, sort=True)
File "C:\Users\Emmanuelle\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\util\_decorators.py", line 178, in wrapper
return func(*args, **kwargs)
File "C:\Users\Emmanuelle\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\core\algorithms.py", line 630, in factorize
na_value=na_value)
File "C:\Users\Emmanuelle\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\core\algorithms.py", line 476, in _factorize_array
na_value=na_value)
File "pandas\_libs\hashtable_class_helper.pxi", line 1601, in pandas._libs.hashtable.PyObjectHashTable.get_labels
TypeError: unhashable type: 'csr_matrix'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "verbatim_ameliore2.py", line 169, in <module>
dfs = [df.set_index(common) for df in (df1, df2, df3, df4)]
File "verbatim_ameliore2.py", line 169, in <listcomp>
dfs = [df.set_index(common) for df in (df1, df2, df3, df4)]
File "C:\Users\Emmanuelle\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\core\frame.py", line 3915, in set_index
index = _ensure_index_from_sequences(arrays, names)
File "C:\Users\Emmanuelle\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\core\indexes\base.py", line 4911, in _ensure_index_from_sequences
return MultiIndex.from_arrays(sequences, names=names)
File "C:\Users\Emmanuelle\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\core\indexes\multi.py", line 1274, in from_arrays
labels, levels = _factorize_from_iterables(arrays)
File "C:\Users\Emnuelle\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\core\arrays\categorical.py", line 2543, in _factorize_from_iterables
return map(list, lzip(*[_factorize_from_iterable(it) for it in iterables]))
File "C:\Users\Emmanlle\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\core\arrays\categorical.py", line 2543, in <listcomp>
return map(list, lzip(*[_factorize_from_iterable(it) for it in iterables]))
File "C:\Users\Emmanle\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\core\arrays\categorical.py", line 2515, in _factorize_from_iterable
cat = Categorical(values, ordered=True)
File "C:\Users\Emmanu\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\core\arrays\categorical.py", line 347, in __init__
codes, categories = factorize(values, sort=False)
File "C:\Users\Emuelle\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\util\_decorators.py", line 178, in wrapper
return func(*args, **kwargs)
File "C:\Users\Emmelle\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\core\algorithms.py", line 630, in factorize
na_value=na_value)
File "C:\Users\Emmae\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\core\algorithms.py", line 476, in _factorize_array
na_value=na_value)
File "pandas\_libs\hashtable_class_helper.pxi", line 1601, in pandas._libs.hashtable.PyObjectHashTable.get_labels
TypeError: unhashable type: 'csr_matrix'