Python:在同一文本文件中进行多次搜索

时间:2017-09-06 05:47:24

标签: python python-3.x text-files

我有一个庞大的文本文件,其数据如下:

Name : ABC  
Bank : Bank1    
Account-No : 01234567    
Amount: 123456    
Spouse : CDF    
Name : ABD    
Bank : Bank1    
Account-No : 01234568    
Amount: 12345    
Spouse : BDF    
Name : ABE    
Bank : Bank2    
Account-No : 01234569    
Amount: 12344    
Spouse : CDG    
.
.
.
.
.

我需要抓取Account-NoAmount,然后将它们写入新文件

Account-No: 01234567
Amount    : 123456
Account-No: 01234568
Amount    : 12345
Account-No: 01234569
Amount    : 12344
.
.
.

我尝试通过mmap搜索文本文件以获取Account-No的位置,但我无法 获得下一个帐户 - 不通过此。

import mmap
fname = input("Enter the file name")
f1 = open(fname)

s  = mmap.mmap(f1.fileno(),0,access=mmap.ACCESS_READ)
if s.find(b'Account-No') != -1:
    r = s.find(b'Account-No')
f1.close()

在' r'我有帐户的第一个位置 - 否,但我无法从(r + 1)搜索到 下一个帐号 -

我可以将它放在循环中,但mmap的确切语法对我不起作用。

任何人都可以通过mmap或任何其他方法帮助我。

4 个答案:

答案 0 :(得分:1)

使用import requests from bs4 import BeautifulSoup import pandas as pd url = "https://www.carsales.com.au/cars/results?offset=12" r = requests.get(url) soup = BeautifulSoup(r.text, "html.parser") model_name = soup.find_all('a', attrs={'data-webm-clickvalue':'sv-view-title'}) final_model_name = model_name[1] clean_model_name = final_model_name.text clean_model_name = clean_model_name.strip().split()[:5] clean_model_name = ' '.join(clean_model_name) print(clean_model_name) ,我们可以执行以下操作:

pandas

答案 1 :(得分:0)

巨大文件的解决方案:

以下是一个工作示例,您可以通过在" required_fields"中添加或删除字段名称来轻松自定义。名单。 此解决方案允许您处理大量文件,因为整个文件不会同时读入内存。

import tempfile

# reproduce your input file
# for the purpose of having a
# working example
input_filename = None
with tempfile.NamedTemporaryFile(delete=False) as f_orig:
    input_filename = f_orig.name
    f_orig.write("""Name : ABC

Bank : Bank1

Account-No : 01234567

Amout: 123456

Spouse : CDF

Name : ABD

Bank : Bank1

Account-No : 01234568

Amout: 12345

Spouse : BDF

Name : ABE

Bank : Bank2

Account-No : 01234569

Amout: 12344

Spouse : CDG""")
    # start looking from the beginning of the file again
    f_orig.seek(0)

    # list the fields you want to keep
    required_fields = [
        'Account-No',
        'Amout',
    ]

    # filter and write, line by line
    result_filename = None
    with tempfile.NamedTemporaryFile(delete=False) as f_result:
        result_filename = f_result.name
        # process one line at a time (memory efficient)
        while True:
            line = f_orig.readline()
            #check if we have reached the end of the file
            if not line:
                break
            for field_name in required_fields:
                # write fields of interest to new file
                if field_name in line:
                    f_result.write(line)
                    f_result.write('\n') # just for formatting


    # show result
    with open(result_filename, 'r') as f:
        print(f.read())

结果是:

Account-No : 01234567

Amout: 123456

Account-No : 01234568

Amout: 12345

Account-No : 01234569

Amout: 12344

答案 2 :(得分:0)

代码:

listOfAllAccountsAndAmounts = [] # list to save all the account and lists
searchTexts = ['Account-No','Amout'] # what all you want to search

with open('a.txt', 'r') as inFile:
    allLines = inFile.readlines() # read all the lines
    # save all the indexes of those that have any of the words from the searchTexts list in them
    indexOfAccounts = [ i for i, line in enumerate(allLines) if any( x in line for x in searchTexts) ] 

    for index in indexOfAccounts:
        listOfAllAccountsAndAmounts.append(allLines[index][:-1].split(': '))

print(listOfAllAccountsAndAmounts)

输出:

[['Account-No ', '01234567'], ['Amout', '123456'], ['Account-No ', '01234568'], ['Amout', '12345'], ['Account-No ', '01234569'], ['Amout', '12344']]

如果您不想按原样拆分和保存:

listOfAllAccountsAndAmounts.append(allLines[index])

输出:

['Account-No : 01234567\n', 'Amout: 123456\n', 'Account-No : 01234568\n', 'Amout: 12345\n', 'Account-No : 01234569\n', 'Amout: 12344\n']

如果您要处理信息,我已写入列表。您也可以直接将字符串直接写入新文件,甚至不使用@Arda所示的列表。

答案 3 :(得分:-1)

你能阅读整个文本文件并返回列表,所以迭代列表并搜索字符串"帐户编号"和"金额"在其中并写入另一个文件。