我有一个这样的数据框:
Num Text
1 15 March 2020 - There was...
2 15 March 2020 - There has been...
3 24 April 2018 - Nothing has ...
4 07 November 2014 - The Kooks....
...
我想从文本的每一行中删除前4个字(即15 March 2020 - , 15 March 2020 -,
...)。
我尝试过
df['Text']=df['Text'].str.replace(' ', )
,但我不知道我应该在括号中包括什么以将这些值替换为空白(或什么都没有)。
答案 0 :(得分:0)
您可以使用 str.split
:
考虑您的df为:
In [1193]: df = pd.DataFrame({'Num':[1,2,3,4], 'Text':['15 March 2020 - There was','15 March 2020 - There has been','24 April 2018 - Nothing has','07 November 2014 - The Kooks']})
In [1194]: df
Out[1194]:
Num Text
0 1 15 March 2020 - There was
1 2 15 March 2020 - There has been
2 3 24 April 2018 - Nothing has
3 4 07 November 2014 - The Kooks
In [1207]: df['Text'].str.split().str[4:].apply(' '.join)
Out[1207]:
0 There was
1 There has been
2 Nothing has
3 The Kooks
Name: Text, dtype: object
答案 1 :(得分:0)
可能有用的方法是使用split命令将其拆分为单词,然后使用[4:]
提取第四个单词之后的所有内容。答案 2 :(得分:0)
Python可以实现不同的正则表达式,示例可能是四个单词str.replace("\d* \d* \d* \d*", '')
,这里是link,以了解有关python正则表达式以及如何检测字符串中不同模式的更多信息。
答案 3 :(得分:0)
您将df.str.split
与df.str.slice
一起使用。
df['test'].str.split(n=4).str[-1]
答案 4 :(得分:0)
即使不太优雅,我还是更喜欢将“ .find()”与“ .apply()”结合使用。无论发生什么“ .find”,第一个“-”都将用作分隔符。
let vs_group =
[
{
"name": "V1_IC11",
"value": "INBOARD_111_COUNT"
},
{
"name": "V1_IC12",
"value": "INBOARD_112_COUNT"
}
...
]
此:
t = pd.DataFrame({'Num':[1,2,3,4], 'Text':['15 March 2020 - There was','15 March 2020 - There has been','24 April 2018 - Nothing has','07 November 2014 - The Kooks']})
t["text2"] = t.apply(lambda x: x['Text'][str(x['Text']).find("- ")+2:], axis=1)
成为这个:
Num Text
1 15 March 2020 - There was...
2 15 March 2020 - There has been...
3 24 April 2018 - Nothing has ...
4 07 November 2014 - The Kooks....