Question

我需要创建一个Python程序，它将读取set目录中的多个.txt文件，然后从文本文件中查找特定标题，并将搜索到的文本标题下的数据存储在.xlsx文档中

.txt文件的示例

person:         Vyacheslav Danik
address:        Ukraine, Kharkov
phone:          +380675746805
address:        Ukraine, Kharkiv
address:        Pavlova st., 319

我在Excel电子表格中需要5个标题;号码，组织，角色，名称和地址。并且对于python程序，在每个扫描的文件的电子表格中将信息放在这些标题下。

任何帮助都会受到赞赏，因为我正在努力解决这个问题。感谢

Answer 1

我自己还是初学者，但我认为这看起来很容易。它更像是您构建和定制的起点。我只选择做一个专栏（人），我非常确定在这个例子中你需要做你想做的事情。您必须通过运行接下来的两个命令来安装访问电子表格所需的2个必需的python库（假设您使用的是某种类型的Linux，但您没有提供足够的信息）：

pip install xlrd

pip install xlutils

以下是这个例子，评论大致解释了每一行的作用。

#!/usr/bin/env python

''' Required to install these libraries to access spreadsheets
pip install xlrd
pip install xlutils
'''

import os, re, string

from xlutils.copy import copy    
from xlrd import open_workbook

book_ro = open_workbook("spreadsheet.xls")

# creates a writeable copy
book = copy(book_ro)

# Select first sheet
sheet1 = book.get_sheet(0)

# Create list to hold people, otherwise we have to figure out the next empty column in spreadsheet
peopleList = []

# Get list of files in current folder and filter only the txt files
for root, dirs, docs in os.walk('.', topdown=True):            

    for filename in docs:
            if (filename.endswith(".txt")) or (filename.endswith(".TXT")):
                filepath=(os.path.join(root, name))

                # Open file read only
                TxtFile = open(filepath,"r")

                # Read all the lines at once in variable
                lines = TxtFile.readlines()

                # Finished reading, close file
                TxtFile.close()

                # Convert file to big string so it can be searched with re.findall
                lines = '\n'.join(lines)

                # Find all occurences of "person:" and capture rest of line
                people = re.findall(r'person: (.*)',lines)

                # Remove delimeters/special character separating each name
                people = map(lambda x: x.strip(), people)

                # If file has more than 1 person, add each one individually
                for person in people:
                    peopleList.append(person)

row = 0
column = 0

# Sort the list and remove duplicates (set(sort)), then step thru list and write to spreadsheet
for person in set(sorted(peopleList)):
    sheet1.write(row, column, person)
    row += 1

# This will overwrite the original spreadsheet if one existed
book.save("spreadsheet.xls")

从一个位置的多个文本文件中查找标题，并添加到具有相同标题的xlsx文档

1 个答案: