Python Split删除分隔符?

时间:2014-07-24 16:21:58

标签: python split

所以我有以下代码,在每个分号或500个字符后放置〜||〜的分隔符。这是有效的,但在找到分号时会删除分号。我已经在这里找到了答案,但是我无法在我的代码中使用它。

chunk_len = 100
split_char = ';'
delim = ("~||~")
d = ";"
f = open(filename, "r")
text = f.read()
f.close()
lines = text.split(';')
for lines_idx, line in enumerate(lines):
    length = len(line)
    if length > chunk_len:
        chunks = [line[idx:idx+chunk_len]for idx in range(0,length,chunk_len)]
        lines[lines_idx] = delim.join(chunks)
new_text = delim.join(lines)
f = open(outputfile, 'w')
f.write(new_text)
f.close()

我在这里找到了这个解决方案,但是我找不到将它合并到我的代码中的方法。对不起重复的问题。

d = ">"
for line in all_lines:
    s =  [e+d for e in line.split(d) if e != ""]

3 个答案:

答案 0 :(得分:0)

更改

lines = text.split(';')

lines = filter(None,re.split('([^;]+;)',text))

这应该保留分号......或者只是稍后将其添加到其他答案中

答案 1 :(得分:0)

如果我正确理解您的问题,那么您真正想要做的就是在每个分号和每500个字符后插入您自己的分隔符。尝试分两步完成:

with open(filename, "r") as fi: # read in file using "with" statement
    text = fi.read()

block_size = 500            # sets how many characters separate new_delim
old_delim = ";"             # character we are adding the new delimiter to
new_delim = "~||~"          # this will be inserted every block_size characters
del_length = len(new_delim) # store length to prevent repeated calculations

for i in xrange(len(line)/block_size): 
    # calculate next index where the new delimiter should be inserted
    index = i*block_size + i*del_length + block_size

    # construct new string with new delimiter at the given index        
    text = "{0}{0}{1}".format(text[:index], new_delim, text[index:]) 

replacement_delim = old_delim + new_delim # old_delim will be replaced with this

with open(outputfile, 'w') as fo:
    # write out new string with new delimiter appended to each semicolon
    fo.write(text.replace(old_delim, replacement_delim))

如果分号碰巧发生在500个字符的倍数上,则最终可能会有两个特殊的分隔符彼此相邻。另外,如果你的字符串中有多个block_size字符,那么你将在字符串的末尾加上分隔符。

此外,如果你有很长的文件正在阅读,这可能不是最佳方法.For循环在每隔时插入分隔符时创建一个全新的字符串。 / p>

这种方法使分割方法对分隔符的处理成为零点。

答案 2 :(得分:-2)

split()拆分字符串并删除分隔符,只需将其重新添加。我在循环中执行了以下操作: line = line + d

chunk_len = 100
split_char = ';'
delim = ("~||~")
d = ";"
f = open(filename, "r")
text = f.read()
f.close()
lines = text.split(';')
for lines_idx, line in enumerate(lines):
    line = line + d  #NEW LINE ADDED HERE
    length = len(line)
    if length > chunk_len:
        chunks = [line[idx:idx+chunk_len]for idx in range(0,length,chunk_len)]
        lines[lines_idx] = delim.join(chunks)
new_text = delim.join(lines)
f = open(outputfile, 'w')
f.write(new_text)
f.close()