我有一个文本文件,其中写了很多行,文本文件中有一个名为“@Testrun”的单词很多次,考虑到“@Testrun”作为起始点,端点也考虑到“@Testrun”这两个“@Testrun”作为一个部分,这些文本可以有3-4个部分。我的问题是如何在部分中提取这些行并在这些部分中找到重复的行:
我的文本文件如下所示:
@TestRun
And user validate message on screen "Switch to paperless"
And user click on "Manage accounts" label
And user click link with label "View all online services"
And user waits for 10 seconds
Then page is successfully launched
And user click link with label "Go paperless for complete convenience"
Then page is successfully launched
And user validate message on screen "#EmailAddress"
And user clicks on the button "Confirm"
Then page is successfully launched
And user validate message on screen "#MessageValidate"
Then page is successfully launched
And user click on "menu open user preferences" label
And user clicks on the link "Statement and letter preferences"
Then page is successfully launched
And user validate "Switch to paperless" button is disabled
And user validate message on screen "Online only"
When user click on "Log out" label
Then page is successfully launched
@TestRun
And user click on link "Mobile site"
And user set text "#Surname" on textbox name "surname"
Then page is successfully launched
And user click on link "#Account"
Then page is successfully launched
And user verify message on screen "#Account"
And user verify message on screen "Manage statements"
And user verify message on screen "Step 1 of 3"
Then page is successfully launched
And user verify message on screen "Current format type"
And user verify message on screen "Online"
When user selects the radio button "Paper"
@TestRun
Then user wait for page load
And user click on button "Continue to Online Banking"
Then user wait for page load
And user click on "menu open user preferences" label
And user clicks on the link "Statement and letter preferences"
Then page is successfully launched
And page is successfully launched
And user waits for 10 seconds
@TestRun
Then page is successfully launched
And user waits for 10 seconds
And user click checkbox "Telephone"
And user click checkbox "Post"
And user clicks on the button "Save"
Then page is successfully launched
我尝试了以下代码,但这不起作用:
with open('CustPref.txt') as input_data:
for line in input_data:
if line.strip() == '@TestRun ':
break
for line in input_data:
if line.strip() == '@TestRun ':
break
print line
我得到输出但是完全不正确。 我只得到一行作为输出,这是不期望的。我如何解决这个
答案 0 :(得分:0)
你解决了2个问题:
分裂:
第一个选项
逐行解析文件:
parts = [] # all lines between 2 @TestRun's
chunks = [] # all chunks of lines between 2 @TestRun's
startNow = False # wait till first @TestRun before keeping anything
for line in Text(): # see definition for Text() below - it mimics your open('...')
if line.strip() == '@TestRun':
startNow = True
if len(parts) > 0: # found a Testrun, if parts contains lines append to chunks
chunks.append(parts)
parts = []
elif startNow == True: # check if first TestRun hit, if so append line to parts
parts.append(line)
print(chunks) # done -> list of list of lines between chunks.
第二个选项
不要按行分割文本,作为完整文本读入并使用列表理解将其拆分:
biggerChunks = [x.strip() for x in TextTT().split("@TestRun") ]
chunkified = [x.splitlines() for x in biggerChunks if len(x.strip()) > 0 ]
首先在@TestRun
上拆分并获取一个大文本块列表,然后逐行拆分。结果大致相同:[[2 @ TestRun之间的所有行]]
删除重复项(同时保留订单)
在这里得到了解答:how-do-you-remove-duplicates-from-a-list-in-whilst-preserving-order - 这是一个SO链接所以不要再在这里反刍了:)
<强>助手强> Text()是打开文件的替换,TestTT()是整个文本块:
def Text(): # instead of file open, returns list of lines
return TextTT().splitlines()
def TextTT(): # unsplit text
return '''
@TestRun
And user validate message on screen "Switch to paperless"
And user click on "Manage accounts" label
And user click link with label "View all online services"
And user waits for 10 seconds
Then page is successfully launched
And user click link with label "Go paperless for complete convenience"
Then page is successfully launched
And user validate message on screen "#EmailAddress"
And user clicks on the button "Confirm"
Then page is successfully launched
And user validate message on screen "#MessageValidate"
Then page is successfully launched
And user click on "menu open user preferences" label
And user clicks on the link "Statement and letter preferences"
Then page is successfully launched
And user validate "Switch to paperless" button is disabled
And user validate message on screen "Online only"
When user click on "Log out" label
Then page is successfully launched
@TestRun
And user click on link "Mobile site"
And user set text "#Surname" on textbox name "surname"
Then page is successfully launched
And user click on link "#Account"
Then page is successfully launched
And user verify message on screen "#Account"
And user verify message on screen "Manage statements"
And user verify message on screen "Step 1 of 3"
Then page is successfully launched
And user verify message on screen "Current format type"
And user verify message on screen "Online"
When user selects the radio button "Paper"
@TestRun
Then user wait for page load
And user click on button "Continue to Online Banking"
Then user wait for page load
And user click on "menu open user preferences" label
And user clicks on the link "Statement and letter preferences"
Then page is successfully launched
And page is successfully launched
And user waits for 10 seconds
@TestRun
Then page is successfully launched
And user waits for 10 seconds
And user click checkbox "Telephone"
And user click checkbox "Post"
And user clicks on the button "Save"
Then page is successfully launched
'''
请参阅注释以获得解释 - 您可以使用f.e. itertools.chain如果需要重新组合内线
答案 1 :(得分:0)
使用more_itertools
第三方库,我们可以在所需目标之前拆分文本。
更新:我们可以使用itertools.dropwhile
在第一个目标之前删除行。
import itertools as it
import more_itertools as mit
with open("CustPref.txt", "r") as f:
lines = f.readlines()
pred = lambda x: x.startswith("@TestRun") # trailing-space protection
inv_pred = lambda x: not pred(x)
lines = it.dropwhile(inv_pred, lines) # optional
chunks = list(mit.split_before(lines, pred))
print(chunks)
输出(缩写)
[['@TestRun\n',
' And user validate message on screen "Switch to paperless" \n',
...],
['@TestRun \n',
' And user click on link "Mobile site" \n',
...],
['@TestRun\n',
'Then user wait for page load\n',
...],
...]
答案 2 :(得分:0)
一种简单的方法是记住你已经看过的线条。您可以将它们收集到列表中,但使用字典或集合会更有效。
一次读一行。如果此行(不是新的TestRun标题和)之前已经看过,请不要打印它。如果它是TestRun标题,请忘记您所看到的内容。打印在循环中获得这一切的所有内容。从下一行开始。
with open('CustPref.txt') as input_data:
seen = set()
for line in input_data:
# trim trailing newline
line = line.rstrip('\n')
if line == '@TestRun ': # really sure about the trailing space?
seen = set() # who am I? what day is it?
elif line in seen:
# skip the rest of the for loop and start over
continue
else:
seen.add(line)
print(line)
以编程方式检查&#34;是否为@TestRun是否有意义,否则如果已经看过,则添加到&#34;按此顺序,您不必检查它是否是@TestRun两次。我想在上面的阐述中保持更自然的顺序,使其更简单。