我是python / pandas的新手,所以这可能是一个非常简单的问题...但是我无法处理: 我有两个从Oracle SQL加载的数据框。一列300行/ 2列,第二列1行/一列。我想将第二个数据集的列添加到每一行的第一列作为新列。但是我只能在第一行得到它,其他的都是NaN。
`import cx_Oracle
import pandas as pd
import numpy as np
import xgboost as xgb
from sklearn.externals import joblib
dsn_tns = cx_Oracle.makedsn('127.0.1.1', '1521', 'orcl')
conn = cx_Oracle.connect(user='MyName', password='MyPass', dsn=dsn_tns)
d_score = pd.read_sql_query(
'''
SELECT
ID
,RESULT
,RATIO_A
,RATIO_B
from ORCL_DATA
''', conn) #return 380 rows
d_score['ID'] = d_score['ID'].astype(int)
d_score['RESULT'] = d_score['RESULT'].astype(int)
d_score['RATIO_A'] = d_score['RATIO_A'].astype(float)
d_score['RATIO_B'] = d_score['RATIO_B'].astype(float)
d_score_features = d_score.iloc [:,2:4]
#d_train_target = d_score.iloc[:,1:2] #target is RESULT
DM_train = xgb.DMatrix(data= d_score_features)
loaded_model = joblib.load("bst.dat")
pred = loaded_model.predict(DM_train)
i = pd.DataFrame({'ID':d_score['ID'],'Probability':pred})
print(i)
s = pd.read_sql_query('''select max(id_process) as MAX_ID_PROCESS from PROCESS''',conn) #return only 1 row
m =pd.DataFrame(data=s, dtype=np.int64,columns = ['MAX_ID_PROCESS'] )
print(m)
i['new'] = m ##Trying to add MAX_ID_PROCESS to all rows
print(i)
i =
ID Probability
0 20101 0.663083
1 20105 0.486774
2 20106 0.441300
3 20278 0.703176
4 20221 0.539185
....
379 20480 0.671976
m =
MAX_ID_PROCESS
0 274
i =
ID_MATCH Probability new
0 20101 0.663083 274.0
1 20105 0.486774 NaN
2 20106 0.441300 NaN
3 20278 0.703176 NaN
4 20221 0.539185 NaN
I need value 'new' for all rows...
答案 0 :(得分:1)
由于第二个数据帧只有一个值,因此您可以像这样分配它:
df1['new'] = df2.MAX_ID_PROCESS[0]
# Or using .loc
df1['new'] = df2.MAX_ID_PROCESS.loc[0]
您的情况应该是:
i['new'] = m.MAX_ID_PROCESS[0]
您现在应该看到:
ID Probability new
0 20101 0.663083 274.0
1 20105 0.486774 274.0
2 20106 0.441300 274.0
3 20278 0.703176 274.0
4 20221 0.539185 274.0
答案 1 :(得分:0)
我们知道我们可以使用代码dataframe2["new_column_name"] = dataframe1["column_to_copy"]
将dataframe1的一列追加到dataframe2作为新列。
我们可以扩展这种方法来解决您的问题。
import pandas as pd
import numpy as np
df1 = pd.DataFrame()
df1["ColA"] = [1, 12, 32, 24,12]
df1["ColB"] = [23, 11, 6, 45,25]
df1["ColC"] = [10, 25, 3, 23,15]
print(df1)
输出:
ColA ColB ColC
0 1 23 10
1 12 11 25
2 32 6 3
3 24 45 23
4 12 25 15
现在,我们创建一个新的数据框并向其中添加一行。
df3 = pd.DataFrame()
df3["ColTest"] = [1]
现在,我们存储第二个数据帧的第一行的值,因为我们希望将其添加到dataframe1中的所有行作为新列:
val = df3.iloc[0]
print(val)
输出:
ColTest 1
Name: 0, dtype: int64
现在,我们将这个值存储在dataframe1中的行中。
rows = len(df1)
for row in range(rows):
df3.loc[row]=val
print(df3)
输出:
ColTest
0 1
1 1
2 1
3 1
4 1
现在,我们将将此列附加到第一个数据框并解决您的问题。
df["ColTest"] = df3["ColTest"]
print(df)
输出:
ColA ColB ColC ColTest
0 1 23 10 1
1 12 11 25 1
2 32 6 3 1
3 24 45 23 1
4 12 25 15 1