Question

我正在尝试以编程方式格式化pdf菜单，一切顺利，直到我注意到某些断行不正确打破了模式。这是我的原始文本的一部分：

LATIN
Saturday & Sunday: 
Build Your Own Breakfast Burrito, Scrambled Eggs, Cheesy Eggs, Latin Tofu
Scramble, Latin Roasted Vegetables
DESSERT
Daily: 
Assorted Pastries

我注意到有些物品（如拉丁豆腐争夺物）在它们中间有一个换行符。鉴于菜单项是可变的，并且在其他地方可能有额外的换行符，有什么方法可以删除逗号之间发生的换行符（因为所有项目都以逗号分隔）？

修改：最终结果理想情况如下：

LATIN
Saturday & Sunday: 
Build Your Own Breakfast Burrito, Scrambled Eggs, Cheesy Eggs, Latin Tofu Scramble, Latin Roasted Vegetables
DESSERT
Daily: 
Assorted Pastries

Answer 1

在python中，您可以使用line.strip('\n')和line.strip('\t')删除换行符并点按空格。

>>> line="Welcomes\n"
>>> line.strip("\n")
'Welcomes'

或者，您可以使用replace（'\ n'，''）从String行中删除所有换行符空格。

>>> line="Welcomes\n"
>>> line.replace('\n','')
'Welcomes'
>>>

或者，您可以使用rstrip()方法从字符串行中删除所有换行符空格

>>> line.rstrip()
'Welcomes'

Answer 2

尝试将re.sub与下面的正则表达式MULTILINE一起使用，它只会替换以逗号开头的换行符和包含逗号的下一行

但是，如果换行符位于最后一项，则无效，例如。拉丁烤蔬菜

txt = '''
LATIN
Saturday & Sunday: 
Build Your Own Breakfast Burrito, Scrambled Eggs, Cheesy Eggs, Latin Tofu
Scramble, Latin Roasted Vegetables
DESSERT
Daily: 
Assorted Pastries
'''

import re
newtxt = re.sub('(,[^\r\n]*?)[\r\n](?=[^\r\n]+?,)', r'\1 ', txt, re.MULTILINE)
# LATIN
# Saturday & Sunday:
# Build Your Own Breakfast Burrito, Scrambled Eggs, Cheesy Eggs, Latin Tofu Scramble, Latin Roasted Vegetables
# DESSERT
# Daily:
# Assorted Pastries

删除Python中的特定换行符

2 个答案: