假设我有一个包含许多列的表from sklearn.preprocessing import OneHotEncoder
def one_hot_encode_features(df_train,df_test):
features = ['Fare', 'Cabin', 'Age', 'Sex']
#features = [ 'Cabin', 'Sex']
df_combined = pd.concat([df_train[features], df_test[features]])
for feature in features:
le = preprocessing.LabelEncoder()
onehot_encoder = OneHotEncoder()
le = le.fit(df_combined[feature])
integer_encoding_train=le.transform(df_train[feature])
integer_encoding_test=le.transform(df_test[feature])
integer_encoding_train = integer_encoding_train.reshape(len(integer_encoding_train), 1)
integer_encoding_test = integer_encoding_test.reshape(len(integer_encoding_test), 1)
df_train[feature] = onehot_encoder.fit_transform(integer_encoding_train)
df_test[feature] = onehot_encoder.fit_transform(integer_encoding_test)
return df_train, df_test
data_train, data_test = one_hot_encode_features(data_train, data_test)
from sklearn.model_selection import train_test_split
X = data_train.drop(['Survived', 'PassengerId'], axis=1)
Y = data_train['Survived']
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, random_state=23)
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import make_scorer, accuracy_score
from sklearn.model_selection import GridSearchCV
clf = GaussianNB()
acc_scorer = make_scorer(accuracy_score)
clf.fit(X_train, Y_train)
,并且我想使用MERGE语句插入/更新行。这是这样的:
PRODUCTS
编写UPDATE和INSERT"子语句"似乎我必须再次指定每个列字段。因此MERGE INTO PRODUCTS AS Target
USING (VALUES(42, 'Foo', 'Bar', 0, 14, 200, NULL)) AS Source (ID, Name, Description, IsSpecialPrice, CategoryID, Price, SomeOtherField)
ON Target.ID = Source.ID
WHEN MATCHED THEN
-- update
WHEN NOT MATCHED BY TARGET THEN
-- insert
将替换为
-- update
和UPDATE SET ID = Source.ID, Name = Source.Name, Description = Source.Description...
-- insert
这非常容易出错,难以维护,而且在我只想合并两个"字段集的简单情况下显然不需要这样做。每个代表一个完整的表行。我很欣赏更新和插入语句实际上可以是任何东西(我在过去的一个不寻常的情况下已经使用过它),但如果有一种更简洁的方式来表示我想要的情况,那将会很棒"目标=来源"或"插入来源"。
是否存在编写更新和插入语句的更好方法,或者我是否真的需要每次都指定完整的列列表?
答案 0 :(得分:2)
您必须编写完整的列列表。
您可以查看MERGE
here的文档。大多数SQL Server语句文档都以语法定义开头,该语法定义向您显示完全允许的内容。例如,UPDATE
的部分定义为:
<merge_matched>::=
{ UPDATE SET <set_clause> | DELETE }
<set_clause>::=
SET
{ column_name = { expression | DEFAULT | NULL }
| { udt_column_name.{ { property_name = expression
| field_name = expression }
| method_name ( argument [ ,...n ] ) }
}
| column_name { .WRITE ( expression , @Offset , @Length ) }
| @variable = expression
| @variable = column = expression
| column_name { += | -= | *= | /= | %= | &= | ^= | |= } expression
| @variable { += | -= | *= | /= | %= | &= | ^= | |= } expression
| @variable = column { += | -= | *= | /= | %= | &= | ^= | |= } expression
} [ ,...n ]
如您所见,<set clause>
中的唯一选项是单个列/分配。没有&#34;批量&#34;分配选项。在文档的下方,您会发现INSERT
的选项也需要单独的表达式(至少在VALUES
子句中 - 您可以省略INSERT
之后的列名称但是这通常是不赞成的。)
SQL倾向于支持详细的显式语法。