我有两个DataFrames,每个都有一列(每列300行):
private String createSignature(String hash) throws ProductException {
try {
Signature privateSignature = Signature.getInstance("SHA256withRSA");
privateSignature.initSign(getPrivateKey());
privateSignature.update(hash.getBytes(UTF_8));
byte[] signature = privateSignature.sign();
String result = Base64.encodeBase64String(signature);
System.out.println(result); //THIS RESULT SHOULD MATCH BUT DOESN'T
return result;
} catch (NoSuchAlgorithmException | SignatureException | InvalidKeyException e) {
throw new ProductException(Codes.AUTHENTICATION_ERROR, e);
}
}
private PrivateKey getPrivateKey() throws ProductException {
try {
String key = IOUtils.toString(this.getClass().getResourceAsStream("private.key"));
PemObject pem = new PemReader(new StringReader(key)).readPemObject();
byte[] content = pem.getContent();
KeyFactory keyFactory = KeyFactory.getInstance("RSA");
PKCS8EncodedKeySpec ks = new PKCS8EncodedKeySpec(content);
return keyFactory.generatePrivate(ks);
} catch (IOException | NoSuchAlgorithmException | InvalidKeySpecException e) {
throw new ProductException(Codes.AUTHENTICATION_ERROR, e);
}
}
我想用两列做一个DataFrame。 我尝试过:
df_realite.take(1)
[Row(realite=1.0)]
df_proba_classe_1.take(1)
[Row(probabilite=0.6196931600570679)]
但是
_ = spark.createDataFrame([df_realite.rdd, df_proba_classe_1.rdd] ,
schema=StructType([ StructField('realite' , FloatType() ) ,
StructField('probabilite' , FloatType() ) ]))
给我空值:
_.take(10)
答案 0 :(得分:0)
也许有一种更简洁的方法(或者没有联接的方法),但是您总是可以给他们两个id并像这样联接它们:
travel_to
答案 1 :(得分:0)
我认为这是您要寻找的,并且仅当您的数据非常小(如您的情况(300行))时才建议使用此方法,因为collect()并不是处理大量数据的好方法,否则与虚拟cols的加入路由并进行广播加入,因此不会发生混洗
from pyspark.sql.functions import *
from pyspark.sql.types import *
df1 = spark.range(10).select(col("id").cast("float"))
df2 = spark.range(10).select(col("id").cast("float"))
l1 = df1.rdd.flatMap(lambda x: x).collect()
l2 = df2.rdd.flatMap(lambda x: x).collect()
list_df = zip(l1, l2)
schema=StructType([ StructField('realite', FloatType() ) ,
StructField('probabilite' , FloatType() ) ])
df = spark.createDataFrame(list_df, schema=schema)
df.show()
+-------+-----------+
|realite|probabilite|
+-------+-----------+
| 0.0| 0.0|
| 1.0| 1.0|
| 2.0| 2.0|
| 3.0| 3.0|
| 4.0| 4.0|
| 5.0| 5.0|
| 6.0| 6.0|
| 7.0| 7.0|
| 8.0| 8.0|
| 9.0| 9.0|
+-------+-----------+