我正在尝试根据第一列中值的跳转,将包含3列的文本文件拆分为许多较小的单个文本文件。这是要拆分的文件的一小部分的示例:
2457062.30520078 1.00579146 1
2457062.30588184 1.00607543 1
2457062.30656300 1.00605515 1
2457062.71112193 1.00288150 1
2457062.71180299 1.00322454 1
2457062.71248415 1.00430136 1
在第3行和第4行之间有一个比平时更大的跳跃。这将是分割数据和单独创建的文本文件分开的点,创建一个前三行,一个后三行。跳跃总是超过第一列中的0.1的变化。目标是像这个例子一样跳转是分离文件的分裂点。感谢任何见解,谢谢
答案 0 :(得分:1)
只要符合条件,我就会遍历主文件并继续写行。这完全符合while循环的定义。这主要的复杂性是你需要同时打开两个文件(主要文件和你当前正在编写的文件),但这对Python来说不是问题。
MAINTEXT = "big_file.txt"
SFILE_TEMPL = 'small_file_{:03.0g}.txt'
# Delimiter is a space in the example you gave, but
# might be tab (\t) or comma or anything.
DELIMITER = ' '
LIM = .1
# i will count how many files we have created.
i = 0
# Open the main file
with open(MAINTEXT) as mainfile:
# Read the first line and set up some things
line = mainfile.readline()
# Note that we want the first element ([0]) before
# the delimiter (.split(DELIMITER)) of the row (line)
# as a number (float)
v_cur = float(line.split(DELIMITER)[0])
v_prev = v_cur
# This will stop the loop once we reach end of file (EOF)
# as readline() will then return an empty string.
while line:
# Open the second file for writing (mode='w').
with open(SFILE_TEMPL.format(i), mode='w') as subfile:
# As long as your values are in the limit, keep
# writing lines to the current file.
while line and abs(v_prev - v_cur)<LIM:
subfile.write(line)
line = mainfile.readline()
v_prev = v_cur
v_cur = float(line.split(DELIMITER)[0])
# Increment the file counter
i += 1
# Make sure we don't get stuck after one file
# (If we don't replace v_prev here, the while loop
# will never execute after the first time.)
v_prev = v_cur
答案 1 :(得分:0)
假设您的文件是test.txt,那么
f=open('test.txt').read().split('\n')
for i in f:
frst_colmn,second_colmn,thrid_colmn = i.split('')
随着你阅读文件,但你想要做什么???
答案 2 :(得分:0)
您可以在阅读文件时检测跳转
def reader(infile):
number = float('-infinity')
for line in infile:
prev, number = number, float(line.split(' ', 1)[0])
jump = number - prev >= 0.1
yield jump, line
for jump, line in reader(infile):
# jump is True if one must open a new output file
...