所以我有以下代码,在每个分号或500个字符后放置〜||〜的分隔符。这是有效的,但在找到分号时会删除分号。我已经在这里找到了答案,但是我无法在我的代码中使用它。
chunk_len = 100
split_char = ';'
delim = ("~||~")
d = ";"
f = open(filename, "r")
text = f.read()
f.close()
lines = text.split(';')
for lines_idx, line in enumerate(lines):
length = len(line)
if length > chunk_len:
chunks = [line[idx:idx+chunk_len]for idx in range(0,length,chunk_len)]
lines[lines_idx] = delim.join(chunks)
new_text = delim.join(lines)
f = open(outputfile, 'w')
f.write(new_text)
f.close()
我在这里找到了这个解决方案,但是我找不到将它合并到我的代码中的方法。对不起重复的问题。
d = ">"
for line in all_lines:
s = [e+d for e in line.split(d) if e != ""]
答案 0 :(得分:0)
更改
lines = text.split(';')
到
lines = filter(None,re.split('([^;]+;)',text))
这应该保留分号......或者只是稍后将其添加到其他答案中
答案 1 :(得分:0)
如果我正确理解您的问题,那么您真正想要做的就是在每个分号和每500个字符后插入您自己的分隔符。尝试分两步完成:
with open(filename, "r") as fi: # read in file using "with" statement
text = fi.read()
block_size = 500 # sets how many characters separate new_delim
old_delim = ";" # character we are adding the new delimiter to
new_delim = "~||~" # this will be inserted every block_size characters
del_length = len(new_delim) # store length to prevent repeated calculations
for i in xrange(len(line)/block_size):
# calculate next index where the new delimiter should be inserted
index = i*block_size + i*del_length + block_size
# construct new string with new delimiter at the given index
text = "{0}{0}{1}".format(text[:index], new_delim, text[index:])
replacement_delim = old_delim + new_delim # old_delim will be replaced with this
with open(outputfile, 'w') as fo:
# write out new string with new delimiter appended to each semicolon
fo.write(text.replace(old_delim, replacement_delim))
如果分号碰巧发生在500个字符的倍数上,则最终可能会有两个特殊的分隔符彼此相邻。另外,如果你的字符串中有多个block_size字符,那么你将在字符串的末尾加上分隔符。
此外,如果你有很长的文件正在阅读,这可能不是最佳方法.For循环在每隔时插入分隔符时创建一个全新的字符串。 / p>
这种方法使分割方法对分隔符的处理成为零点。
答案 2 :(得分:-2)
split()
拆分字符串并删除分隔符,只需将其重新添加。我在循环中执行了以下操作:
line = line + d
chunk_len = 100
split_char = ';'
delim = ("~||~")
d = ";"
f = open(filename, "r")
text = f.read()
f.close()
lines = text.split(';')
for lines_idx, line in enumerate(lines):
line = line + d #NEW LINE ADDED HERE
length = len(line)
if length > chunk_len:
chunks = [line[idx:idx+chunk_len]for idx in range(0,length,chunk_len)]
lines[lines_idx] = delim.join(chunks)
new_text = delim.join(lines)
f = open(outputfile, 'w')
f.write(new_text)
f.close()