Question

我试图读取.dat文件中的前4位数字，并将其存储在每一行的循环中。 .dat文件如下所示：

0004 | IP
0006 | IP
0008 | IP

我想创建一个循环，该循环读取前四位数字，并存储该循环的迭代，直到读取整个文件，然后将其写入输出文件中。

我写了这个，但是它所做的基本上就是将.dat转换为csv

with open('stores.dat', 'r') as input_file:
    lines = input_file.readlines()
    newLines = []
    for line in lines:
        newLine = line.strip('|').split()
        newLines.append(newLine)


with open('file.csv', 'w') as output_file:
    file_writer = csv.writer(output_file)
    file_writer.writerows(newLines)

Answer 1

由于您知道每次要读取4个字符，因此只需阅读一个切片即可：

import csv

# you can open multiple file handles at the same time
with open('stores.dat', 'r') as input_file, \
     open('file.csv', 'w') as output_file:
    file_writer = csv.writer(output_file)
    # iterate over the file handle directly to get the lines
    for line in input_file:
        row = line[:4] # slice the first 4 chars
        # make sure this is wrapped as a list otherwise
        # you'll get unsightly commas in your rows
        file_writer.writerow([row])

哪个输出

$ cat file.csv
0004
0006
0008

Answer 2

如果每行总是有四位数字，那么它就很简单

with open('stores.dat', 'r') as input_file:
               lines = input_file.readlines()
               newLines = []
               for line in lines:
                  newLine = line[:4]
                  newLines.append(newLine)

否则，您可以使用正则表达式来完成这项工作，例如：

import re

with open('stores.dat', 'r') as input_file:
               lines = input_file.readlines()
               newLines = []
               for line in lines:
                  newLine = re.findall(r'\d{3}', line)[0]
                  newLines.append(newLine)

请注意，re.findall()将返回一个list，其中包含该行的所有匹配项，因此，最后的[0]仅返回第一个匹配项或该行的第一个元素列表。

读取文件中一行的前4位数字并将其存储

2 个答案: