Question

我已经在Python中编写了以下代码来“清理”我的字符串：

 df['TextCleaning'] = df['Text'].apply(lambda x: re.findall('[äöüßÖÄa-zA-Z].*[öäüßÖÄÜa-zA-Z0-9]', x)[0])

现在，我将“ 1.2.1 Hello”（文本）更改为“ Hello”（TextCleaning）。我现在要做的是->将“ 1.2.1”保存在自己的列中。你能帮我吗？

Answer 1

这将为您服务

output =  "2.1.3 Hello world"
word1 = re.findall("\d+\.\d+\.\d", output )

输出

['2.1.3']

output =  "2.45.6 Hello 22.3.9 world"
word = re.findall("\d+\.\d+\.\d", output )

输出

['2.45.6'，'22 .3.9']

output =  "2.6 Hello 3.9 world"
word = re.findall("\d+\.\d", output )

输出

['2.6'，'3.9']

Answer 2

您可以使用expand=True来进行pd.Series.str.split：

df[['Text', 'TextCleaning'] = df['Text'].str.split('(?![öäüßÖÄÜa-zA-Z0-9])\s+(?=[äöüßÖÄa-zA-Z])', expand=True)

Answer 3

尝试一下

更改正则表达式，

out =  "1.2.1 Hello "
new = " ".join(re.findall("[0-9.]+", out))

输出

'1.2.1'