Question

考虑一个示例数据框：

df = 
+-------+-----+
|   tech|state|
+-------+-----+
|     70|wa   |
|     50|mn   |
|     20|fl   |
|     50|mo   |
|     10|ar   |
|     90|wi   |
|     30|al   |
|     50|ca   |
+-------+-----+

我想更改“技术”列，以便将任何值50都更改为1，而所有其他值都等于0。

输出看起来像这样：

df = 
+-------+-----+
|   tech|state|
+-------+-----+
|     0 |wa   |
|     1 |mn   |
|     0 |fl   |
|     1 |mo   |
|     0 |ar   |
|     0 |wi   |
|     0 |al   |
|     1 |ca   |
+-------+-----+

这是我到目前为止所拥有的：

from pyspark.sql.functions import UserDefinedFunction
from pyspark.sql.types import StringType


changing_column = 'tech'
udf_first = UserDefinedFunction(lambda x: 1, IntegerType())
udf_second = UserDefinedFunction(lambda x: 0, IntegerType())
first_df = zero_df.select(*[udf_first(changing_column) if column == 50 else column for column in zero_df])
second_df = first_df.select(*[udf_second(changing_column) if column != 50 else column for column in first_df])
second_df.show()

Answer 1

希望这会有所帮助

' Connect to active directory
Set objDSE = GetObject("LDAP://rootDSE")
Set objConnection = CreateObject("ADODB.Connection")
objConnection.Provider = "ADsDSOObject"
objConnection.Open
Set objCommand = CreateObject("ADODB.Command")
Set objCommand.ActiveConnection = objConnection
SearchString = "Max Mustermann"

' Contact lookup using SQL-query
objCommand.CommandText = _
    "SELECT givenname, sn, mail, telephoneNumber, mobile, mailNickName, c, l, postalCode, department, company, streetAddress " & _
    "FROM 'LDAP://" & objDSE.Get("defaultNamingContext") & "' " & _
    "WHERE objectCategory='person' AND (mail = '" & SearchString t & "' OR givenname & sn = '" & SearchString & "')"
Set objRecordset = objCommand.Execute

If Not objRecordset.EOF Then
' Further processing which is not relevant to the question
' ...

如何根据同一列的条件更改PySpark数据框中的值？

1 个答案: