我正在尝试在熊猫中连接两个DataFrame。其中一个数据框只是我从另一个数据框获取并进行转换的某些列,因此我绝不求助于它们。但是当我尝试将它们串联时,我收到一条错误消息,说它们无法串联在一起,因此它们几乎对角地串联在一起,因为行数加倍(因为每行具有相同的行),并且列数在一个列中逐列增加再加上另一个。
理想情况下,我希望行数保持不变,而列数应该是一个列中的列加上另一个列中的列。下面是我的代码:
## In the below code I create new names for the scaled fields by adding SC_ to
## their existing names
SC_ExplanVars = []
for var in explan_vars:
sc_var= "SC_" + var
SC_ExplanVars.append(sc_var)
## Scale the columns from my dataframe that will be used as explanatory
## variables
X_Scale = preprocessing.scale(data[ExplanVars])
## Put my newly scaled explanatory variables into a DataFrame with same headers
## but with SC_ infont
X_Scale = pd.DataFrame(X_Scale, columns = SC_ExplanVars)
## Concatenate scaled variables onto original dataset
datat = pd.concat([data, X_Scale], axis=1)
我得到警告:
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexes\api.py:77: RuntimeWarning: '<' not supported between instances of 'str' and 'int', sort order is undefined for incomparable objects
result = result.union(other)
编辑
下面是我所描述的表格。它只是前10行,我将其更改为仅一列,但似乎仍然给我同样的问题
Data=
Col1
297
297
297
297
275
275
275
400
400
400
X_Scale =
SC_Col1
-0.4644471998668502
-0.4644471998668502
-0.4644471998668502
-0.4644471998668502
-0.8849343767010354
-0.8849343767010354
-0.8849343767010354
1.5041973098568349
1.5041973098568349
1.5041973098568349
连接后
datat =
Col1 SC_Col1
297.0 NaN
297.0 NaN
297.0 NaN
297.0 NaN
275.0 NaN
275.0 NaN
275.0 NaN
400.0 NaN
400.0 NaN
400.0 NaN
NaN -0.4644471998668502
NaN -0.4644471998668502
NaN -0.4644471998668502
NaN -0.4644471998668502
NaN -0.8849343767010354
NaN -0.8849343767010354
NaN -0.8849343767010354
NaN 1.5041973098568349
NaN 1.5041973098568349
NaN 1.5041973098568349
答案 0 :(得分:1)
可能有一个不同的索引标签,请在连接前尝试在每个数据框中使用reset_index():
示例我有这两个具有不同索引名称的数据框,并尝试concat
:
d1={'Col1':[297,297,297,297,275,275,275,400,400,400]}
d2={'SC_Col1': [-0.4644471998668502,-0.4644471998668502,-0.4644471998668502,-0.4644471998668502,-0.8849343767010354,-0.8849343767010354,-0.8849343767010354,1.5041973098568349,1.5041973098568349,1.5041973098568349]}
df1=pd.DataFrame(d1, index=[10,11,12,13,14,15,16,17,18,19])
df2=pd.DataFrame(d2)
print(pd.concat([df1, df2], axis=1))
输出:
Col1 SC_Col1
0 NaN -0.464447
1 NaN -0.464447
2 NaN -0.464447
3 NaN -0.464447
4 NaN -0.884934
5 NaN -0.884934
6 NaN -0.884934
7 NaN 1.504197
8 NaN 1.504197
9 NaN 1.504197
10 297.0 NaN
11 297.0 NaN
12 297.0 NaN
13 297.0 NaN
14 275.0 NaN
15 275.0 NaN
16 275.0 NaN
17 400.0 NaN
18 400.0 NaN
19 400.0 NaN
在reset_index()
操作之前,将drop=True
与参数concat()
一起使用后,数据帧将如下所示:
df1=df1.reset_index(drop=True)
df2.reset_index(drop=True)
print(pd.concat([df1, df2], axis=1))
输出:
Col1 SC_Col1
0 297 -0.464447
1 297 -0.464447
2 297 -0.464447
3 297 -0.464447
4 275 -0.884934
5 275 -0.884934
6 275 -0.884934
7 400 1.504197
8 400 1.504197
9 400 1.504197
希望这可以为您提供帮助:)