假设有一个数据帧如下:
machine_id | value
1| 5
1| 3
1| 4
我想生成像这样的最终数据框
machine_id | value | sum
1| 5|null
1| 3| 8
1| 4| 7
基本上我必须做一个大小为2的窗口,但是对于第一行,我们不想总结为零。它只会被填充为null。
var winSpec = Window.orderBy("machine_id ").partitionBy("machine_id ").rangeBetween(-1, 0)
df.withColumn("sum", sum("value").over(winSpec))
答案 0 :(得分:1)
您可以使用lag
功能,使用滞后(值,1)添加值列:
val df = Seq((1,5),(1,3),(1,4)).toDF("machine_id", "value")
import org.apache.spark.sql.expressions.Window
val window = Window.partitionBy("machine_id").orderBy("id")
(df.withColumn("id", monotonically_increasing_id)
.withColumn("sum", $"value" + lag($"value",1).over(window))
.drop("id").show())
+----------+-----+----+
|machine_id|value| sum|
+----------+-----+----+
| 1| 5|null|
| 1| 3| 8|
| 1| 4| 7|
+----------+-----+----+
答案 1 :(得分:0)
您应该使用try
{
var objectSearcher = new ManagementObjectSearcher("root\\StandardCimv2", $@"select Name, InterfaceName, InterfaceType, NdisPhysicalMedium from MSFT_NetAdapter where ConnectorPresent=1"); //Physical adapter
int count = 0;
foreach (var managementObject in objectSearcher.Get())
{
//The locally unique identifier for the network interface. in InterfaceType_NetluidIndex format. Ex: Ethernet_2.
string interfaceName = managementObject["InterfaceName"]?.ToString();
//The interface type as defined by the Internet Assigned Names Authority (IANA).
//https://www.iana.org/assignments/ianaiftype-mib/ianaiftype-mib
UInt32 interfaceType = Convert.ToUInt32(managementObject["InterfaceType"]);
//The types of physical media that the network adapter supports.
UInt32 ndisPhysicalMedium = Convert.ToUInt32(managementObject["NdisPhysicalMedium"]);
if (!string.IsNullOrEmpty(interfaceName) &&
interfaceType == 6 && //ethernetCsmacd(6) --for all ethernet-like interfaces, regardless of speed, as per RFC3635
(ndisPhysicalMedium == 0 || ndisPhysicalMedium == 14)) //802.3
{
count++;
}
}
return count;
}
catch (ManagementException)
{
//Run-time requirements WMI MSFT_NetAdapter class is included in Windows 8 and Windows Server 2012
}
而不是rowsBetween
api,如下所示
rangeBetween
哪个应该能给你预期的结果
import org.apache.spark.sql.functions._
var winSpec = Window.orderBy("machine_id").partitionBy("machine_id").rowsBetween(-1, 0)
df.withColumn("sum", sum("value").over(winSpec))
.withColumn("sum", when($"sum" === $"value", null).otherwise($"sum"))
.show(false)
我希望答案很有帮助