Question

我有一个这样的数据框：

Num           Text 
1        15 March 2020 - There was...
2        15 March 2020 - There has been...
3        24 April 2018 - Nothing has ...
4        07 November 2014 - The Kooks....
...

我想从文本的每一行中删除前4个字（即15 March 2020 - , 15 March 2020 -, ...）。我尝试过

df['Text']=df['Text'].str.replace(' ', )，但我不知道我应该在括号中包括什么以将这些值替换为空白（或什么都没有）。

Answer 1

您可以使用 str.split：

考虑您的df为：

In [1193]: df = pd.DataFrame({'Num':[1,2,3,4], 'Text':['15 March 2020 - There was','15 March 2020 - There has been','24 April 2018 - Nothing has','07 November 2014 - The Kooks']})

In [1194]: df
Out[1194]: 
   Num                            Text
0    1       15 March 2020 - There was
1    2  15 March 2020 - There has been
2    3     24 April 2018 - Nothing has
3    4    07 November 2014 - The Kooks

In [1207]: df['Text'].str.split().str[4:].apply(' '.join)                                                                                                                                                
Out[1207]: 
0         There was
1    There has been
2       Nothing has
3         The Kooks
Name: Text, dtype: object

Answer 2

可能有用的方法是使用split命令将其拆分为单词，然后使用[4：]

提取第四个单词之后的所有内容。

Answer 3

Python可以实现不同的正则表达式，示例可能是四个单词str.replace("\d* \d* \d* \d*", '')，这里是link，以了解有关python正则表达式以及如何检测字符串中不同模式的更多信息。

Answer 4

您将df.str.split与df.str.slice一起使用。

df['test'].str.split(n=4).str[-1]

Answer 5

即使不太优雅，我还是更喜欢将“ .find（）”与“ .apply（）”结合使用。无论发生什么“ .find”，第一个“-”都将用作分隔符。

let vs_group = 
[
  {
     "name": "V1_IC11",
     "value": "INBOARD_111_COUNT"
  },
  {
     "name": "V1_IC12",
     "value": "INBOARD_112_COUNT"
  } 
  ...
]

此：

t = pd.DataFrame({'Num':[1,2,3,4], 'Text':['15 March 2020 - There was','15 March 2020 - There has been','24 April 2018 - Nothing has','07 November 2014 - The Kooks']})

t["text2"] = t.apply(lambda x: x['Text'][str(x['Text']).find("- ")+2:], axis=1)

成为这个：

Num           Text 
1        15 March 2020 - There was...
2        15 March 2020 - There has been...
3        24 April 2018 - Nothing has ...
4        07 November 2014 - The Kooks....

从列数据框中的字符串中删除单词

5 个答案: