将数据从文本文件提取到输出文件

时间:2016-04-06 10:11:12

标签: python file text extract

我有很多名字只是数字的文件。 (从1开始到最大数量)并且这些文件中的每一个都通过它们的“标签”(ObjectID =,X =,Y =等)彼此相似,但这些标签之后的值不同一点都不。

我希望通过手动将数据从一个文件复制/粘贴到另一个文件来使我的工作更轻松,并使用Python制作一个小脚本(因为我对它有一点经验)。

这是完整的脚本:

import os

BASE_DIRECTORY = 'C:\Users\Tom\Desktop\TheServer\scriptfiles\Objects'
output_file = open('output.txt', 'w')
output = {}
file_list = []

for (dirpath, dirnames, filenames) in os.walk(BASE_DIRECTORY):
    for f in filenames:
        if 'txt' in str(f):
            e = os.path.join(str(dirpath), str(f))
            file_list.append(e)

for f in file_list:
    print f
    txtfile = open(f, 'r')
    output[f] = []
    for line in txtfile:
        if 'ObjectID =' in line:
            output[f].append(line)
        elif 'X =' in line:
            output[f].append(line)
        elif 'Y =' in line:
            output[f].append(line)
tabs = []
for tab in output:
    tabs.append(tab)

tabs.sort()
for tab in tabs:
    for row in output[tab]:
        output_file.write(row + '')

现在,一切正常,输出文件如下所示:

ObjectID = 1216
X = -1480.500610
Y = 2610.885742
ObjectID = 970
X = -1517.210693
Y = 2522.842285
ObjectID = 3802
X = -1512.156616
Y = 2521.116210
etc.

但我不希望它像那样(每个值都有一个新行)。我需要它为每个文件执行此操作:

  1. 阅读文件。
  2. 删除值前面的标记。
  3. 格式化单行,该行将在输出文件夹中包含这些值。 (假设我想让它看起来像这样:“(1216,-1480.500610,2522.842285)”)
  4. 在输出文件夹中写下该行。
  5. 对每个文件重复。
  6. 请帮忙吗?

4 个答案:

答案 0 :(得分:1)

在你的循环中,跟踪你是否在' in'记录:

records = []
in_record = False
id, x, y = 0, 0, 0
for line in txtfile:
    if not in_record:
        if 'ObjectID =' in line:
            in_record = True
            id = line[10:]
    elif 'X =' in line:
        x = line[3:]
    elif 'Y =' in line:
        y = line[3:]
        records.append((id, x, y))
        in_record = False

然后,您将拥有一个元组列表,您可以使用csv模块轻松编写这些元组。

答案 1 :(得分:1)

希望这有帮助。

data = open('sam.txt', 'r').read()

>>> print data
ObjectID = 1216
X = -1480.500610
Y = 2610.885742
ObjectID = 970
X = -1517.210693
Y = 2522.842285
ObjectID = 3802
X = -1512.156616
Y = 2521.116210
>>> 

现在让我们做一些字符串替换:)

>>> data = data.replace('ObjectID =', '').replace('\nX = ', ',').replace('\nY = ', ',')
>>> print data
 1216,-1480.500610,2610.885742
 970,-1517.210693,2522.842285
 3802,-1512.156616,2521.116210

答案 2 :(得分:0)

这是你需要的。我没有足够的时间编写将结果附加到新文件的代码。相反,它只是打印它,但你明白了。

import os.path

path = "path"

#getting the number of files in your folder
num_files = len([f for f in os.listdir(path)
                if os.path.isfile(os.path.join(path, f))])

#function that returns your desired output for a given file
def file_head_ext(file_path, file_num):
    with open(file_path + "/" + file_num) as myfile:
        head = [next(myfile).split("=") for x in range(3)]
        formatted_head = [elm[1].replace("\n",'').replace(" ","") for elm in head]
    return(",".join(formatted_head))


for filnum in range(1,num_files):
    print(file_head_ext(path, str(filnum)))

答案 3 :(得分:0)

在此处找到您生成内容的循环版本 我重写了它,所以行内容ObjectId,X和Y在同一行。

看起来这就是你想要做的事情:

for f in file_list:
    print f
    txtfile = open(f, 'r')
    output[f] = []
    for line in txtfile:
        myline = ''
        if 'ObjectID =' in line:
            pos = line.rfind("ObjectID =") + len("ObjectID =")
            rest = line[pos:]
            # Here you set the delimiter after the ObjectID value. Can be ","
            numbers = rest.split(" ")
            if len(numbers) > 0: 
                myline.append(numbers[0])

        elif 'X =' in line:
            pos = line.rfind("X =") + len("X =")
            rest = line[pos:]
            # Here you set the delimiter after the ObjectID value. Can be ","
            numbers = rest.split(" ")
            if len(numbers) > 0: 
                myline.append(numbers[0])
        elif 'Y =' in line:
            pos = line.rfind("Y =") + len("Y =")
            rest = line[pos:]
            # Here you set the delimiter after the ObjectID value. Can be ","
            numbers = rest.split(" ")
            if len(numbers) > 0: 
                myline.append(numbers[0])

        output[f].append(myline)

注意您需要知道哪个字符(代码中的分隔符)将您尝试查找的名称与实际值分开:ObjectID =想从线上抓住。