我需要创建一个Python程序,它将读取set目录中的多个.txt文件,然后从文本文件中查找特定标题,并将搜索到的文本标题下的数据存储在.xlsx文档中
.txt文件的示例
person: Vyacheslav Danik
address: Ukraine, Kharkov
phone: +380675746805
address: Ukraine, Kharkiv
address: Pavlova st., 319
我在Excel电子表格中需要5个标题;号码,组织,角色,名称和地址。并且对于python程序,在每个扫描的文件的电子表格中将信息放在这些标题下。
任何帮助都会受到赞赏,因为我正在努力解决这个问题。感谢
答案 0 :(得分:0)
我自己还是初学者,但我认为这看起来很容易。它更像是您构建和定制的起点。我只选择做一个专栏(人),我非常确定在这个例子中你需要做你想做的事情。您必须通过运行接下来的两个命令来安装访问电子表格所需的2个必需的python库(假设您使用的是某种类型的Linux,但您没有提供足够的信息):
pip install xlrd
pip install xlutils
以下是这个例子,评论大致解释了每一行的作用。
#!/usr/bin/env python
''' Required to install these libraries to access spreadsheets
pip install xlrd
pip install xlutils
'''
import os, re, string
from xlutils.copy import copy
from xlrd import open_workbook
book_ro = open_workbook("spreadsheet.xls")
# creates a writeable copy
book = copy(book_ro)
# Select first sheet
sheet1 = book.get_sheet(0)
# Create list to hold people, otherwise we have to figure out the next empty column in spreadsheet
peopleList = []
# Get list of files in current folder and filter only the txt files
for root, dirs, docs in os.walk('.', topdown=True):
for filename in docs:
if (filename.endswith(".txt")) or (filename.endswith(".TXT")):
filepath=(os.path.join(root, name))
# Open file read only
TxtFile = open(filepath,"r")
# Read all the lines at once in variable
lines = TxtFile.readlines()
# Finished reading, close file
TxtFile.close()
# Convert file to big string so it can be searched with re.findall
lines = '\n'.join(lines)
# Find all occurences of "person:" and capture rest of line
people = re.findall(r'person: (.*)',lines)
# Remove delimeters/special character separating each name
people = map(lambda x: x.strip(), people)
# If file has more than 1 person, add each one individually
for person in people:
peopleList.append(person)
row = 0
column = 0
# Sort the list and remove duplicates (set(sort)), then step thru list and write to spreadsheet
for person in set(sorted(peopleList)):
sheet1.write(row, column, person)
row += 1
# This will overwrite the original spreadsheet if one existed
book.save("spreadsheet.xls")