如下所示,有大量的空白,开始,结束,中间的行。我试图从中间删除这些额外的空格。这是我尝试过的,但我一直得到错误:
testdata = [{'col1': ' Sea Ice Prediction Network . '},
{'col1': ' Movies, Ratings, .... etc.'},
{'col1': 'Iceland, Greenland, Mountains '},
{'col1': ' My test file'}]
df = pd.DataFrame(testdata)
' '.join(testdata['col1'].split()) #Error: list indices must be integers or slices, not str
df['col1'].str.lstrip() #list indices must be integers or slices, not str
df['col1'].str.rstrip() #list indices must be integers or slices, not str
#removes start and end, but not ideal to remove one line at a time.
' Sea Ice Prediction Network . '.lstrip()
' Sea Ice Prediction Network . '.rstrip()
如何删除此内容?谢谢!
Clean Output:
'Sea Ice Prediction Network .'
'Movies, Ratings, .... etc.'
'Iceland, Greenland, Mountains '
'My test file'
答案 0 :(得分:2)
使用replace
df.replace({' +':' '},regex=True)
Out[348]:
col1
0 Sea Ice Prediction Network .
1 Movies, Ratings, .... etc.
2 Iceland, Greenland, Mountains
3 My test file
答案 1 :(得分:0)
您可以使用re
模块将字符串中的任何空格替换为单个空格,然后从开头和结尾删除任何内容:
re.sub('\s+', ' ', ' Sea Ice Prediction Network . ').strip()
'Sea Ice Prediction Network .'
.
之前是否有空格?