我有一个庞大的文本文件,其数据如下:
Name : ABC
Bank : Bank1
Account-No : 01234567
Amount: 123456
Spouse : CDF
Name : ABD
Bank : Bank1
Account-No : 01234568
Amount: 12345
Spouse : BDF
Name : ABE
Bank : Bank2
Account-No : 01234569
Amount: 12344
Spouse : CDG
.
.
.
.
.
我需要抓取Account-No
和Amount
,然后将它们写入新文件
Account-No: 01234567
Amount : 123456
Account-No: 01234568
Amount : 12345
Account-No: 01234569
Amount : 12344
.
.
.
我尝试通过mmap
搜索文本文件以获取Account-No的位置,但我无法
获得下一个帐户 - 不通过此。
import mmap
fname = input("Enter the file name")
f1 = open(fname)
s = mmap.mmap(f1.fileno(),0,access=mmap.ACCESS_READ)
if s.find(b'Account-No') != -1:
r = s.find(b'Account-No')
f1.close()
在' r'我有帐户的第一个位置 - 否,但我无法从(r + 1)搜索到 下一个帐号 -
我可以将它放在循环中,但mmap的确切语法对我不起作用。
任何人都可以通过mmap或任何其他方法帮助我。
答案 0 :(得分:1)
使用import requests
from bs4 import BeautifulSoup
import pandas as pd
url = "https://www.carsales.com.au/cars/results?offset=12"
r = requests.get(url)
soup = BeautifulSoup(r.text, "html.parser")
model_name = soup.find_all('a', attrs={'data-webm-clickvalue':'sv-view-title'})
final_model_name = model_name[1]
clean_model_name = final_model_name.text
clean_model_name = clean_model_name.strip().split()[:5]
clean_model_name = ' '.join(clean_model_name)
print(clean_model_name)
,我们可以执行以下操作:
pandas
答案 1 :(得分:0)
以下是一个工作示例,您可以通过在" required_fields"中添加或删除字段名称来轻松自定义。名单。 此解决方案允许您处理大量文件,因为整个文件不会同时读入内存。
import tempfile
# reproduce your input file
# for the purpose of having a
# working example
input_filename = None
with tempfile.NamedTemporaryFile(delete=False) as f_orig:
input_filename = f_orig.name
f_orig.write("""Name : ABC
Bank : Bank1
Account-No : 01234567
Amout: 123456
Spouse : CDF
Name : ABD
Bank : Bank1
Account-No : 01234568
Amout: 12345
Spouse : BDF
Name : ABE
Bank : Bank2
Account-No : 01234569
Amout: 12344
Spouse : CDG""")
# start looking from the beginning of the file again
f_orig.seek(0)
# list the fields you want to keep
required_fields = [
'Account-No',
'Amout',
]
# filter and write, line by line
result_filename = None
with tempfile.NamedTemporaryFile(delete=False) as f_result:
result_filename = f_result.name
# process one line at a time (memory efficient)
while True:
line = f_orig.readline()
#check if we have reached the end of the file
if not line:
break
for field_name in required_fields:
# write fields of interest to new file
if field_name in line:
f_result.write(line)
f_result.write('\n') # just for formatting
# show result
with open(result_filename, 'r') as f:
print(f.read())
结果是:
Account-No : 01234567
Amout: 123456
Account-No : 01234568
Amout: 12345
Account-No : 01234569
Amout: 12344
答案 2 :(得分:0)
代码:
listOfAllAccountsAndAmounts = [] # list to save all the account and lists
searchTexts = ['Account-No','Amout'] # what all you want to search
with open('a.txt', 'r') as inFile:
allLines = inFile.readlines() # read all the lines
# save all the indexes of those that have any of the words from the searchTexts list in them
indexOfAccounts = [ i for i, line in enumerate(allLines) if any( x in line for x in searchTexts) ]
for index in indexOfAccounts:
listOfAllAccountsAndAmounts.append(allLines[index][:-1].split(': '))
print(listOfAllAccountsAndAmounts)
输出:
[['Account-No ', '01234567'], ['Amout', '123456'], ['Account-No ', '01234568'], ['Amout', '12345'], ['Account-No ', '01234569'], ['Amout', '12344']]
如果您不想按原样拆分和保存:
listOfAllAccountsAndAmounts.append(allLines[index])
输出:
['Account-No : 01234567\n', 'Amout: 123456\n', 'Account-No : 01234568\n', 'Amout: 12345\n', 'Account-No : 01234569\n', 'Amout: 12344\n']
如果您要处理信息,我已写入列表。您也可以直接将字符串直接写入新文件,甚至不使用@Arda所示的列表。
答案 3 :(得分:-1)
你能阅读整个文本文件并返回列表,所以迭代列表并搜索字符串"帐户编号"和"金额"在其中并写入另一个文件。