串联熊猫数据框使行加倍

时间:2019-06-02 12:06:07

标签: python-3.x pandas dataframe scikit-learn

我正在尝试在熊猫中连接两个DataFrame。其中一个数据框只是我从另一个数据框获取并进行转换的某些列,因此我绝不求助于它们。但是当我尝试将它们串联时,我收到一条错误消息,说它们无法串联在一起,因此它们几乎对角地串联在一起,因为行数加倍(因为每行具有相同的行),并且列数在一个列中逐列增加再加上另一个。

理想情况下,我希望行数保持不变,而列数应该是一个列中的列加上另一个列中的列。下面是我的代码:

## In the below code I create new names for the scaled fields by adding SC_ to 
## their existing names
SC_ExplanVars = []

for var in explan_vars:
    sc_var= "SC_" + var
    SC_ExplanVars.append(sc_var)

## Scale the columns from my dataframe that will be used as explanatory 
## variables
X_Scale = preprocessing.scale(data[ExplanVars])

## Put my newly scaled explanatory variables into a DataFrame with same headers
## but with SC_ infont
X_Scale = pd.DataFrame(X_Scale, columns = SC_ExplanVars)

## Concatenate scaled variables onto original dataset
datat = pd.concat([data, X_Scale], axis=1)

我得到警告:

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexes\api.py:77: RuntimeWarning: '<' not supported between instances of 'str' and 'int', sort order is undefined for incomparable objects
  result = result.union(other)

编辑

下面是我所描述的表格。它只是前10行,我将其更改为仅一列,但似乎仍然给我同样的问题

Data=
    Col1
    297
    297
    297
    297
    275
    275
    275
    400
    400
    400

X_Scale = 
SC_Col1
-0.4644471998668502
-0.4644471998668502
-0.4644471998668502
-0.4644471998668502
-0.8849343767010354
-0.8849343767010354
-0.8849343767010354
1.5041973098568349
1.5041973098568349
1.5041973098568349

连接后

datat = 
Col1    SC_Col1
297.0   NaN
297.0   NaN
297.0   NaN
297.0   NaN
275.0   NaN
275.0   NaN
275.0   NaN
400.0   NaN
400.0   NaN
400.0   NaN
NaN -0.4644471998668502
NaN -0.4644471998668502
NaN -0.4644471998668502
NaN -0.4644471998668502
NaN -0.8849343767010354
NaN -0.8849343767010354
NaN -0.8849343767010354
NaN 1.5041973098568349
NaN 1.5041973098568349
NaN 1.5041973098568349

1 个答案:

答案 0 :(得分:1)

可能有一个不同的索引标签,请在连接前尝试在每个数据框中使用reset_index()

示例我有这两个具有不同索引名称的数据框,并尝试concat

d1={'Col1':[297,297,297,297,275,275,275,400,400,400]}
d2={'SC_Col1': [-0.4644471998668502,-0.4644471998668502,-0.4644471998668502,-0.4644471998668502,-0.8849343767010354,-0.8849343767010354,-0.8849343767010354,1.5041973098568349,1.5041973098568349,1.5041973098568349]}

df1=pd.DataFrame(d1, index=[10,11,12,13,14,15,16,17,18,19])
df2=pd.DataFrame(d2)
print(pd.concat([df1, df2], axis=1))

输出:

     Col1   SC_Col1
0     NaN -0.464447
1     NaN -0.464447
2     NaN -0.464447
3     NaN -0.464447
4     NaN -0.884934
5     NaN -0.884934
6     NaN -0.884934
7     NaN  1.504197
8     NaN  1.504197
9     NaN  1.504197
10  297.0       NaN
11  297.0       NaN
12  297.0       NaN
13  297.0       NaN
14  275.0       NaN
15  275.0       NaN
16  275.0       NaN
17  400.0       NaN
18  400.0       NaN
19  400.0       NaN

reset_index()操作之前,将drop=True与参数concat()一起使用后,数据帧将如下所示:

df1=df1.reset_index(drop=True)
df2.reset_index(drop=True)
print(pd.concat([df1, df2], axis=1))

输出:

   Col1   SC_Col1
0   297 -0.464447
1   297 -0.464447
2   297 -0.464447
3   297 -0.464447
4   275 -0.884934
5   275 -0.884934
6   275 -0.884934
7   400  1.504197
8   400  1.504197
9   400  1.504197

希望这可以为您提供帮助:)