我无法为pandas数据框中的所有行添加列

时间:2019-08-07 22:45:09

标签: python-3.x pandas dataframe

我是python / pandas的新手,所以这可能是一个非常简单的问题...但是我无法处理: 我有两个从Oracle SQL加载的数据框。一列300行/ 2列,第二列1行/一列。我想将第二个数据集的列添加到每一行的第一列作为新列。但是我只能在第一行得到它,其他的都是NaN。

`import cx_Oracle
import pandas as pd
import numpy as np
import xgboost as xgb
from sklearn.externals import joblib

dsn_tns = cx_Oracle.makedsn('127.0.1.1', '1521', 'orcl')
conn = cx_Oracle.connect(user='MyName', password='MyPass', dsn=dsn_tns)



d_score = pd.read_sql_query(
'''
SELECT
 ID
,RESULT
,RATIO_A
,RATIO_B
from ORCL_DATA
''', conn) #return 380 rows


d_score['ID'] = d_score['ID'].astype(int)
d_score['RESULT'] = d_score['RESULT'].astype(int)
d_score['RATIO_A'] = d_score['RATIO_A'].astype(float)
d_score['RATIO_B'] = d_score['RATIO_B'].astype(float)


d_score_features = d_score.iloc [:,2:4]
#d_train_target = d_score.iloc[:,1:2] #target is RESULT

DM_train = xgb.DMatrix(data= d_score_features)


loaded_model = joblib.load("bst.dat")
pred = loaded_model.predict(DM_train)


i = pd.DataFrame({'ID':d_score['ID'],'Probability':pred})
print(i)


s = pd.read_sql_query('''select max(id_process) as MAX_ID_PROCESS from PROCESS''',conn) #return only 1 row

m =pd.DataFrame(data=s, dtype=np.int64,columns = ['MAX_ID_PROCESS'] )
print(m)

i['new'] = m     ##Trying to add MAX_ID_PROCESS to all rows

    print(i)



i = 

          ID     Probability
0       20101     0.663083  
1       20105     0.486774 
2       20106     0.441300 
3       20278     0.703176 
4       20221     0.539185 
....
379     20480     0.671976


m = 
     MAX_ID_PROCESS
0       274



i = 

  ID_MATCH  Probability    new
0       20101     0.663083  274.0
1       20105     0.486774    NaN
2       20106     0.441300    NaN
3       20278     0.703176    NaN
4       20221     0.539185    NaN


I need value 'new' for all rows...

2 个答案:

答案 0 :(得分:1)

由于第二个数据帧只有一个值,因此您可以像这样分配它:

df1['new'] = df2.MAX_ID_PROCESS[0]

# Or using .loc
df1['new'] = df2.MAX_ID_PROCESS.loc[0]

您的情况应该是:

i['new'] = m.MAX_ID_PROCESS[0]

您现在应该看到:

           ID  Probability      new
0       20101     0.663083    274.0
1       20105     0.486774    274.0
2       20106     0.441300    274.0
3       20278     0.703176    274.0
4       20221     0.539185    274.0

答案 1 :(得分:0)

我们知道我们可以使用代码dataframe2["new_column_name"] = dataframe1["column_to_copy"]将dataframe1的一列追加到dataframe2作为新列。

我们可以扩展这种方法来解决您的问题。

import pandas as pd
import numpy as np

df1 = pd.DataFrame()

df1["ColA"] = [1, 12, 32, 24,12]
df1["ColB"] = [23, 11, 6, 45,25]
df1["ColC"] = [10, 25, 3, 23,15]

print(df1)

输出:

   ColA  ColB  ColC
0     1    23    10
1    12    11    25
2    32     6     3
3    24    45    23
4    12    25    15

现在,我们创建一个新的数据框并向其中添加一行。

df3 = pd.DataFrame()
df3["ColTest"] = [1]

现在,我们存储第二个数据帧的第一行的值,因为我们希望将其添加到dataframe1中的所有行作为新列:

val = df3.iloc[0]
print(val)

输出:

ColTest    1
Name: 0, dtype: int64

现在,我们将这个值存储在dataframe1中的行中。

rows = len(df1)
for row in range(rows):
    df3.loc[row]=val
print(df3)

输出:

   ColTest
0        1
1        1
2        1
3        1
4        1

现在,我们将将此列附加到第一个数据框并解决您的问题。

df["ColTest"] = df3["ColTest"]
print(df)

输出:

   ColA  ColB  ColC  ColTest
0     1    23    10        1
1    12    11    25        1
2    32     6     3        1
3    24    45    23        1
4    12    25    15        1