如何使用python 2.7整理这个数据集

时间:2018-02-08 02:30:02

标签: python

我有下面的数据集,我想整理一下。

Review Title : Very poor
Upvotes : 1
Downvotes : 0
Review Content :
Hank all time this device ... fews day speakar sound not clear output
Review Title : Don't waste your money
Upvotes : 1
Downvotes : 1
Review Content :
Don't buy this product , its not good .just a waste of money.it starts showing small defects from starting few months of use and then after one year after warranty is over its mother was not working .and u can .ever fix it
  Sorry I didn't like this phone

我想使用python将这些数据整形为以下格式。

Review Title : Very poor
Upvotes : 1
Downvotes : 0
Review Content : Hank all time this device ... fews day speakar sound not clear output

Review Title : Don't waste your money
Upvotes : 1
Downvotes : 1
Review Content : Don't buy this product , its not good .just a waste of money.it starts showing small defects from starting few months of use and then after one year after warranty is over its mother was not working .and u can .ever fix it Sorry I didn't like this phone

我想在冒号之后移动文本,但我不知道如何。

1 个答案:

答案 0 :(得分:1)

import re

text = '''your_text_here'''

text = re.sub("Review Content :\s+", "Review Content : ", text)
text = re.sub("Review Title : ", "\n\nReview Title : ", text)
text = text.strip()

print(text)

使用re library可以更轻松地对字符串进行操作:

  • 第一个sub替换了"回顾内容"之后的空白字符链。只有1个空间。多亏了你的内容和#34;评论内容"标签
  • 第二个sub在"评论标题"之前添加了2个换行符。标签
  • strip()从字符串的开头和结尾删除空格,这有效地删除了在第一个" Review Title"之前添加的两个换行符。在上一步中