我使用子字符串来获取第一个和最后一个值。但是,如何在字符串中找到特定字符并获取其前后的值
答案 0 :(得分:1)
尝试这些...听起来像您要找的东西
参考文档:
https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.functions.substring_index https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.functions.split
df = spark.createDataFrame([('hello-there',)], ['text'])
from pyspark.sql.functions import substring_index
df.select(substring_index(df.text, '-', 1).alias('left')).show() # left of delim
df.select(substring_index(df.text, '-', -1).alias('right')).show() # right of delim
+-----+
| left|
+-----+
|hello|
+-----+
+-----+
|right|
+-----+
|there|
+-----+
from pyspark.sql.functions import split
split_df = df.select(split(df.text, '-').alias('split_text'))
split_df.selectExpr("split_text[0] as left").show() # left of delim
split_df.selectExpr("split_text[1] as right").show() # right of delim
+-----+
| left|
+-----+
|hello|
+-----+
+-----+
|right|
+-----+
|there|
+-----+
from pyspark.sql.functions import substring_index, substring, concat, col, lit
df = spark.createDataFrame([('will-smith',)], ['text'])
df = df\
.withColumn("left", substring_index(df.text, '-', 1))\
.withColumn("right", substring_index(df.text, '-', -1))\
df = df\
.withColumn("left_sub", substring(df.left, -2, 2))\
.withColumn("right_sub", substring(df.right, 1, 2))
df = df\
.withColumn("concat_sub", concat(col("left_sub"), lit("-"), col("right_sub")))
df.show()
+----------+----+-----+--------+---------+----------+
| text|left|right|left_sub|right_sub|concat_sub|
+----------+----+-----+--------+---------+----------+
|will-smith|will|smith| ll| sm| ll-sm|
+----------+----+-----+--------+---------+----------+