Question

.txt文件包含68行。第68行有5个我需要提取的数据，但我不知道如何。我有大约20个.txt文件，所有这些都需要他们的第68行读取。但是，我需要将所有提取的数据放到一个excel文件中。

以下是68行：

Final graph has 1496 nodes and n50 of 53706, max 306216, total 5252643, using 384548/389191 reads

我基本上需要所有这些数字。

Answer 1

使用以下命令打开文本文件：

f = open('filepath.txt', 'r')
for line in f:
    #do operations for each line in the textfile

对要阅读的每个文本文件重复

Here's a link to a python library for reading/writing to/from excel. You want to use xlwt, it sounds like

Answer 2

我喜欢将openpyxl用于此类任务。以下是一个文件的示例。您应该能够将其扩展到多个文件。你没有确切地说你想如何格式化电子表格中的数据，所以我只创建了一行标题，然后是文件的一行数据（5个字段）。如果我有关于您的项目的更多信息，这可以改进。

from openpyxl import Workbook
import re

wb = Workbook()
ws = wb.get_active_sheet()

# write column headers
ws.cell(row=0, column=0).value = 'nodes'
ws.cell(row=0, column=1).value = 'n50'
ws.cell(row=0, column=2).value = 'max'
ws.cell(row=0, column=3).value = 'total'
ws.cell(row=0, column=4).value = 'reads'

# open file and extract lines into list            
f = open("somedata.txt", "r")
lines = f.readlines()

# compile regex using named groups and apply regex to line 68
p = re.compile("^Final\sgraph\shas\s(?P<nodes>\d+)\snodes\sand\sn50\sof\s(?P<n50>\d+),\smax\s(?P<max>\d+),\stotal\s(?P<total>\d+),\susing\s(?P<reads>\d+\/\d+)\sreads$")
m = p.match(lines[67])

# if we have a match, then write the data to the spreadsheet
if (m):
    ws.cell(row=1, column=0).value = m.group('nodes')
    ws.cell(row=1, column=1).value = m.group('n50')
    ws.cell(row=1, column=2).value = m.group('max')
    ws.cell(row=1, column=3).value = m.group('total')
    ws.cell(row=1, column=4).value = m.group('reads')

wb.save('mydata.xlsx')

Answer 3

以下不如大卫的优雅但更透明，后者依赖于正则表达式。它强烈依赖于您所描述的特定格式。此外，在我看来，实际上有6个（不是5个）变量你关心 - 除非你可以将读数中的比率转换成小数部分。

您需要在nameList中提供正确的文件名列表（手动，如果它们没有以方便的方式命名）。

另外，我不输出到excel文件而是输出到csv。当然，在Excel中打开csv文件非常简单，您可以将其保存为xls。

编辑以回应评论（05/19/13）：包括完整路径很简单。

import csv
import string

# Make list of all 20 files like so:
nameList = ['/full/path/to/Log.txt', '/different/path/to/Log.txt', '/yet/another/path/to/Log.txt']

lineNum = 68

myCols = ['nodes','n50','max','total','reads1','reads2']
myData = []

for name in nameList:
    fi = open(name,"r")

    table = string.maketrans("","")

    # split line lineNum into list of strings
    strings = fi.readlines()[lineNum-1].split()

    # remove punctuation appropriately
    nodes = int(strings[3])
    n50 = int(strings[8].translate(table,string.punctuation))
    myMax = int(strings[10].translate(table,string.punctuation))
    total = int(strings[12].translate(table,string.punctuation))
    reads1 = int(strings[14].split('/')[0])
    reads2 = int(strings[14].split('/')[1])

    myData.append([nodes, n50, myMax, total, reads1, reads2])

# Write the data out to a new csv file
fileOut = "out.csv"
csvFileOut = open(fileOut,"w")
myWriter = csv.writer(csvFileOut)
myWriter.writerow(myCols)
for line in myData:
    myWriter.writerow(line)
csvFileOut.close()

我需要从多个.txt文件中提取数据并使用Python将它们移动到Excel文件中

3 个答案: