根据Python中的值差异将列文本文件拆分为较小的文件

时间:2017-07-25 15:00:09

标签: python list file text split

我正在尝试根据第一列中值的跳转,将包含3列的文本文件拆分为许多较小的单个文本文件。这是要拆分的文件的一小部分的示例:

2457062.30520078 1.00579146 1

2457062.30588184 1.00607543 1

2457062.30656300 1.00605515 1

2457062.71112193 1.00288150 1

2457062.71180299 1.00322454 1

2457062.71248415 1.00430136 1

在第3行和第4行之间有一个比平时更大的跳跃。这将是分割数据和单独创建的文本文件分开的点,创建一个前三行,一个后三行。跳跃总是超过第一列中的0.1的变化。目标是像这个例子一样跳转是分离文件的分裂点。感谢任何见解,谢谢

3 个答案:

答案 0 :(得分:1)

只要符合条件,我就会遍历主文件并继续写行。这完全符合while循环的定义。这主要的复杂性是你需要同时打开两个文件(主要文件和你当前正在编写的文件),但这对Python来说不​​是问题。

MAINTEXT = "big_file.txt"
SFILE_TEMPL = 'small_file_{:03.0g}.txt'
# Delimiter is a space in the example you gave, but 
#  might be tab (\t) or comma or anything.
DELIMITER = ' ' 

LIM = .1

# i will count how many files we have created.
i = 0

# Open the main file
with open(MAINTEXT) as mainfile:
    # Read the first line and set up some things
    line = mainfile.readline()
    # Note that we want the first element ([0]) before
    #  the delimiter (.split(DELIMITER)) of the row (line)
    #  as a number (float)
    v_cur = float(line.split(DELIMITER)[0])
    v_prev = v_cur

    # This will stop the loop once we reach end of file (EOF)
    #  as readline() will then return an empty string.
    while line:
        # Open the second file for writing (mode='w').
        with open(SFILE_TEMPL.format(i), mode='w') as subfile:
            # As long as your values are in the limit, keep 
            #  writing lines to the current file.
            while line and abs(v_prev - v_cur)<LIM:
                subfile.write(line)
                line = mainfile.readline()
                v_prev = v_cur
                v_cur = float(line.split(DELIMITER)[0])
        # Increment the file counter
        i += 1
        # Make sure we don't get stuck after one file
        #  (If we don't replace v_prev here, the while loop
        #  will never execute after the first time.)
        v_prev = v_cur

答案 1 :(得分:0)

假设您的文件是test.txt,那么

f=open('test.txt').read().split('\n')
for i in f:
    frst_colmn,second_colmn,thrid_colmn = i.split('')

随着你阅读文件,但你想要做什么???

答案 2 :(得分:0)

您可以在阅读文件时检测跳转

def reader(infile):
    number = float('-infinity')
    for line in infile:
        prev, number = number, float(line.split(' ', 1)[0])
        jump = number - prev >= 0.1
        yield jump, line

for jump, line in reader(infile):
    # jump is True if one must open a new output file
    ...