我收到来自测试回归失败的错误消息的csv,并将其导入到pandas数据框中,但是我想找到一些与异常有关的子字符串,尤其是。
我用.csv的内容填充数据框,如下所示:
p-dialog /deep/ .ui-dialog-footer {
width: 100%;
float: left;
box-sizing: border-box;
padding: .5em;
}
我有以下正则表达式和相应的测试字符串(这是错误消息的数据框列中的第一个条目),该字符串恰好返回了我想要的内容:
df = pd.read_csv('ErrorMessage3.csv', header=None, sep=',',
names=['ErrorMessage'])
这将导致以下输出:
teststring = "Step 13 - Iteration 1 Failed: Action: <Update Latest CC Exp
Date Record from Epay Account {DBServer;UserName;Password='',
DatabaseName='',Year Offset='-10'}> --->
System.Data.SqlTypes.SqlNullValueException: Data is Null. This method or
property cannotbecalled
on Null values. ---> System.Data.SqlTypes.SqlNullValueException2: Data is Null."
re.findall(r"---> ([^:]+): ", teststring)
但我希望能够将其添加为数据框中的“例外”列。我认为这会起作用:
['System.Data.SqlTypes.SqlNullValueException',
'System.Data.SqlTypes.SqlNullValueException2']
但是当我运行它时,我添加了“ Exceptions”列,但所有行都为NaN。我验证了我的ErrorMessage是对象类型,并且使用了一个在线正则表达式测试器来验证至少我的ErrorMessage条目的一个子集确实确实包含与我的正则表达式匹配的异常。我读过其他一些看起来很相似的堆栈溢出问题,但是运气不高。
为什么将正则表达式应用于数据帧会产生nan,而将其应用于单个字符串则会返回我想要的东西呢?
答案 0 :(得分:1)
teststring1 = """Step 13 - Iteration 1 Failed: Action: <Update Latest CC Exp Date Record from Epay Account
{DBServer;UserName;Password='', DatabaseName='',Year Offset='-10'}> ---> System.Data.SqlTypes.SqlNullValueException1:
Data is Null. This method or property cannotbecalled on Null values. ---> System.Data.SqlTypes.SqlNullValueException2: Data is Null.
---> System.Data.SqlTypes.SqlNullValueException21: ---> System.Data.SqlTypes.SqlNullValueException22: ---> System.Data.SqlTypes.SqlNullValueException23:
---> System.Data.SqlTypes.SqlNullValueException24: """
teststring2 = """Step 13 - Iteration 1 Failed: Action: <Update Latest CC Exp Date Record from Epay Account
{DBServer;UserName;Password='', DatabaseName='',Year Offset='-10'}> ---> System.Data.SqlTypes.SqlNullValueException3:
Data is Null. This method or property cannotbecalled on Null values. ---> System.Data.SqlTypes.SqlNullValueException4: Data is Null."""
teststring3 = """Step 13 - Iteration 1 Failed: Action: <Update Latest CC Exp Date Record from Epay Account
{DBServer;UserName;Password='', DatabaseName='',Year Offset='-10'}> ---> System.Data.SqlTypes.SqlNullValueException5:
Data is Null. This method or property cannotbecalled on Null values. ---> System.Data.SqlTypes.SqlNullValueException6: Data is Null."""
teststring4 = """Step 13 - Iteration 1 Failed: Action: <Update Latest CC Exp Date Record from Epay Account
{DBServer;UserName;Password='', DatabaseName='',Year Offset='-10'}> ---> System.Data.SqlTypes.SqlNullValueException7:
Data is Null. This method or property cannotbecalled on Null values. ---> System.Data.SqlTypes.SqlNullValueException8: Data is Null."""
teststring5 = """Step 13 - Iteration 1 Failed: Action: <Update Latest CC Exp Date Record from Epay Account
{DBServer;UserName;Password='', DatabaseName='',Year Offset='-10'}> ---> System.Data.SqlTypes.SqlNullValueException9:
Data is Null. This method or property cannotbecalled on Null values. ---> System.Data.SqlTypes.SqlNullValueException10: Data is Null."""
teststring6 = """Step 13 - Iteration 1 Failed: Action: <Update Latest CC Exp Date Record from Epay Account
{DBServer;UserName;Password='', DatabaseName='',Year Offset='-10'}> ---> System.Data.SqlTypes.SqlNullValueException11:
Data is Null. This method or property cannotbecalled on Null values. ---> System.Data.SqlTypes.SqlNullValueException12: Data is Null."""
values = [[teststring1], [teststring2], [teststring3], [teststring4], [teststring5], [teststring6]]
header = ['ErrorMessage']
df = pd.DataFrame(values, columns=header)
exceptions = df['ErrorMessage'].str.extractall(r"---> ([^:]+): ")
extractall返回一个新的MultiIndex DataFrame,其中第一个索引将与原始DataFrame索引匹配,第二个索引将是提取或匹配的次数。原始和新的DataFrame不兼容。
0
match
0 0 System.Data.SqlTypes.SqlNullValueException1
1 System.Data.SqlTypes.SqlNullValueException2
2 System.Data.SqlTypes.SqlNullValueException21
3 System.Data.SqlTypes.SqlNullValueException22
4 System.Data.SqlTypes.SqlNullValueException23
5 System.Data.SqlTypes.SqlNullValueException24
1 0 System.Data.SqlTypes.SqlNullValueException3
1 System.Data.SqlTypes.SqlNullValueException4
2 0 System.Data.SqlTypes.SqlNullValueException5
1 System.Data.SqlTypes.SqlNullValueException6
3 0 System.Data.SqlTypes.SqlNullValueException7
1 System.Data.SqlTypes.SqlNullValueException8
4 0 System.Data.SqlTypes.SqlNullValueException9
1 System.Data.SqlTypes.SqlNullValueException10
5 0 System.Data.SqlTypes.SqlNullValueException11
1 System.Data.SqlTypes.SqlNullValueException12
答案 1 :(得分:0)
正如@Trenton_M指出的那样, extractall返回一个新的MultiIndex DataFrame ,因此一种解决方案是使用groupby
然后连接所有匹配的字符串。
下面是一个简单的演示:
import pandas as pd
import numpy as np
df = pd.DataFrame([""""Step 13 - Iteration 1 Failed: Action: <Update Latest CC Exp
Date Record from Epay Account {DBServer;UserName;Password='',
DatabaseName='',Year Offset='-10'}> ---> 1System.Data.SqlTypes.SqlNullValueException: Data is Null. This method or
property cannotbecalled
on Null values. ---> 2System.Data.SqlTypes.SqlNullValueException2: Data is Null."""] * 2, columns=['ErrorMessage'])
mulIndexDataFrame = df['ErrorMessage'].str.extractall(r"---> ([^:]+): ")
df['test'] = mulIndexDataFrame.groupby(mulIndexDataFrame.index.get_level_values(0))[0].apply(lambda x: ','.join(x))
print(df)
输出:
ErrorMessage \
0 "Step 13 - Iteration 1 Failed: Action: <Update...
1 "Step 13 - Iteration 1 Failed: Action: <Update...
test
0 1System.Data.SqlTypes.SqlNullValueException,2S...
1 1System.Data.SqlTypes.SqlNullValueException,2S...