我是星火中的UDF新手。我还阅读了答案here
问题陈述:我正在尝试从数据集col找到模式匹配。
Ex:Dataframe
val df = Seq((1, Some("z")), (2, Some("abs,abc,dfg")),
(3,Some("a,b,c,d,e,f,abs,abc,dfg"))).toDF("id", "text")
df.show()
+---+--------------------+
| id| text|
+---+--------------------+
| 1| z|
| 2| abs,abc,dfg|
| 3|a,b,c,d,e,f,abs,a...|
+---+--------------------+
df.filter($"text".contains("abs,abc,dfg")).count()
//returns 2 as abs exits in 2nd row and 3rd row
现在我想为$ text列中的每一行执行此模式匹配,并添加名为count的新列。
结果:
+---+--------------------+-----+
| id| text|count|
+---+--------------------+-----+
| 1| z| 1|
| 2| abs,abc,dfg| 2|
| 3|a,b,c,d,e,f,abs,a...| 1|
+---+--------------------+-----+
我尝试将$ text文件列的udf定义为Array [Seq [String]。但我无法达到我的意图。
到目前为止我尝试了什么:
val txt = df.select("text").collect.map(_.toSeq.map(_.toString)) //convert column to Array[Seq[String]
val valsum = udf((txt:Array[Seq[String],pattern:String)=> {txt.count(_ == pattern) } )
df.withColumn("newCol", valsum( lit(txt) ,df(text)) )).show()
任何帮助将不胜感激
答案 0 :(得分:1)
您必须知道text
列的所有元素,可以使用collect_list
grouping
rows
dataframe
text
作为一个。然后,只需检查收集的数组中count
列中的元素和import sqlContext.implicits._
import org.apache.spark.sql.functions._
import org.apache.spark.sql.expressions._
val df = Seq((1, Some("z")), (2, Some("abs,abc,dfg")),(3,Some("a,b,c,d,e,f,abs,abc,dfg"))).toDF("id", "text")
val valsum = udf((txt: String, array : mutable.WrappedArray[String])=> array.filter(element => element.contains(txt)).size)
df.withColumn("grouping", lit("g"))
.withColumn("array", collect_list("text").over(Window.partitionBy("grouping")))
.withColumn("count", valsum($"text", $"array"))
.drop("grouping", "array")
.show(false)
中的元素,如下面的代码所示。
+---+-----------------------+-----+
|id |text |count|
+---+-----------------------+-----+
|1 |z |1 |
|2 |abs,abc,dfg |2 |
|3 |a,b,c,d,e,f,abs,abc,dfg|1 |
+---+-----------------------+-----+
您应该有以下输出
<?php
$pwort = 'mypassword';
$port = ':80';
$dyntxt = "my_IP.txt";
$pworttest = $_GET["pass"];
$IP = $_GET["meineip"];
if (file_exists($dyntxt)){
if($pworttest==$pwort) {
$a = fopen("$dyntxt", "w");
$dynamicip = $_SERVER["REMOTE_ADDR"];
fwrite($a, $IP);
fclose($a);
}
else {
$a = fopen("$dyntxt", "r+");
$dynamicip = fread($a,filesize($dyntxt));
fclose($a);
$url="http://".$dynamicip."".$port;
header("Location: $url");
}
}
?>
我希望这会有所帮助。