我有一个pyspark数据框,其中包含4列。 一列包含文本(数据是非结构化的)。下面是此列的数据示例:
data = [('Ambitioni dedisse scripsisse iudicaretur',)
,('Cras mattisiudicium',)
,('purus sit amet fermentum',)
,('Donec sed odio operae- NORMAL)
,('eu vulputate felis - A300B4-61 - MP 13219',)
,('Praeterea iter est - quasdam res - MP 28180',)
,('quas ex communi - ,)
,('At nos hinc posthat CONTROL - FADEC',)
,('sitientis piros Afros. Petierunt',)
,('uti sibi concilium totius Galliae-2 - GENERATION',)
,('in dim - V105X )
,('Cras mattis iudicium',)]
df = spark.createDataFrame(data, ["text"])
预期输出示例:
Interest Column == Exemple data new_column
--------------------------------------------------------------------------------------------------------------------------------------|----------------------------
Cras mattis iudicium -INTRODCE A NEW STANDARD
------------------------------------------------------------------------------------------------------------------------
Praeterea iter est
------------------------------------------------------------------------------------------------------------------------
Cras mattis iudicium purus sit amet fermentum.
------------------------------------------------------------------------------------------------------------------------
class to truncate the text ---------------------------------------------------------------------------------------------------------|----------------------------
Ambitioni dedisse -
------------------------------------------------------------------------------------------------------------------------
For left, right, ------------------------------------------------------------------------------------------------------
TCAS II - Praeterea iter est |
------------------------------------------------------------------------------------------------------------------------
Donec sed odio operae
------------------------------------------------------------------------------------------------------------------------
Ambitioni dedisse |
------------------------------------------------------------------------------------------------------------------------
My question:
Thank you