巨大文件的解决方案：

Question

我有一个庞大的文本文件，其数据如下：

Name : ABC  
Bank : Bank1    
Account-No : 01234567    
Amount: 123456    
Spouse : CDF    
Name : ABD    
Bank : Bank1    
Account-No : 01234568    
Amount: 12345    
Spouse : BDF    
Name : ABE    
Bank : Bank2    
Account-No : 01234569    
Amount: 12344    
Spouse : CDG    
.
.
.
.
.

我需要抓取Account-No和Amount，然后将它们写入新文件

Account-No: 01234567
Amount    : 123456
Account-No: 01234568
Amount    : 12345
Account-No: 01234569
Amount    : 12344
.
.
.

我尝试通过mmap搜索文本文件以获取Account-No的位置，但我无法获得下一个帐户 - 不通过此。

import mmap
fname = input("Enter the file name")
f1 = open(fname)

s  = mmap.mmap(f1.fileno(),0,access=mmap.ACCESS_READ)
if s.find(b'Account-No') != -1:
    r = s.find(b'Account-No')
f1.close()

在＆＃39; r＆＃39;我有帐户的第一个位置 - 否，但我无法从（r + 1）搜索到下一个帐号 -

我可以将它放在循环中，但mmap的确切语法对我不起作用。

任何人都可以通过mmap或任何其他方法帮助我。

Answer 1

使用import requests from bs4 import BeautifulSoup import pandas as pd url = "https://www.carsales.com.au/cars/results?offset=12" r = requests.get(url) soup = BeautifulSoup(r.text, "html.parser") model_name = soup.find_all('a', attrs={'data-webm-clickvalue':'sv-view-title'}) final_model_name = model_name[1] clean_model_name = final_model_name.text clean_model_name = clean_model_name.strip().split()[:5] clean_model_name = ' '.join(clean_model_name) print(clean_model_name)，我们可以执行以下操作：

pandas

Answer 2

巨大文件的解决方案：

以下是一个工作示例，您可以通过在＆＃34; required_fields＆＃34;中添加或删除字段名称来轻松自定义。名单。此解决方案允许您处理大量文件，因为整个文件不会同时读入内存。

import tempfile

# reproduce your input file
# for the purpose of having a
# working example
input_filename = None
with tempfile.NamedTemporaryFile(delete=False) as f_orig:
    input_filename = f_orig.name
    f_orig.write("""Name : ABC

Bank : Bank1

Account-No : 01234567

Amout: 123456

Spouse : CDF

Name : ABD

Bank : Bank1

Account-No : 01234568

Amout: 12345

Spouse : BDF

Name : ABE

Bank : Bank2

Account-No : 01234569

Amout: 12344

Spouse : CDG""")
    # start looking from the beginning of the file again
    f_orig.seek(0)

    # list the fields you want to keep
    required_fields = [
        'Account-No',
        'Amout',
    ]

    # filter and write, line by line
    result_filename = None
    with tempfile.NamedTemporaryFile(delete=False) as f_result:
        result_filename = f_result.name
        # process one line at a time (memory efficient)
        while True:
            line = f_orig.readline()
            #check if we have reached the end of the file
            if not line:
                break
            for field_name in required_fields:
                # write fields of interest to new file
                if field_name in line:
                    f_result.write(line)
                    f_result.write('\n') # just for formatting


    # show result
    with open(result_filename, 'r') as f:
        print(f.read())

结果是：

Account-No : 01234567

Amout: 123456

Account-No : 01234568

Amout: 12345

Account-No : 01234569

Amout: 12344

Answer 3

代码：

listOfAllAccountsAndAmounts = [] # list to save all the account and lists
searchTexts = ['Account-No','Amout'] # what all you want to search

with open('a.txt', 'r') as inFile:
    allLines = inFile.readlines() # read all the lines
    # save all the indexes of those that have any of the words from the searchTexts list in them
    indexOfAccounts = [ i for i, line in enumerate(allLines) if any( x in line for x in searchTexts) ] 

    for index in indexOfAccounts:
        listOfAllAccountsAndAmounts.append(allLines[index][:-1].split(': '))

print(listOfAllAccountsAndAmounts)

输出：

[['Account-No ', '01234567'], ['Amout', '123456'], ['Account-No ', '01234568'], ['Amout', '12345'], ['Account-No ', '01234569'], ['Amout', '12344']]

如果您不想按原样拆分和保存：

listOfAllAccountsAndAmounts.append(allLines[index])

输出：

['Account-No : 01234567\n', 'Amout: 123456\n', 'Account-No : 01234568\n', 'Amout: 12345\n', 'Account-No : 01234569\n', 'Amout: 12344\n']

如果您要处理信息，我已写入列表。您也可以直接将字符串直接写入新文件，甚至不使用@Arda所示的列表。

Answer 4

你能阅读整个文本文件并返回列表，所以迭代列表并搜索字符串＆＃34;帐户编号＆＃34;和＆＃34;金额＆＃34;在其中并写入另一个文件。

Python：在同一文本文件中进行多次搜索

4 个答案:

巨大文件的解决方案：