正则表达式与多个条件匹配

时间:2019-12-06 18:04:59

标签: python regex

我需要从以下 rows_string 变量获取字母字符串:

'Equity & 1,638 & \\$3,227,305 & \\$2,649,208 & \\$3,270,402 & \\$3,114,298 & \\$3,173,369 & \\$2,978,769 & \\$3,016,161 & \\$2,807,840\\\\\nFixed Income & 420 & \\$765,856 & \\$661,395 & \\$824,603 & \\$792,579 & \\$794,224 & \\$783,793 & \\$719,307 & \\$630,298\\\\\nCommodities & 119 & \\$72,911 & \\$66,302 & \\$81,649 & \\$81,633 & \\$79,296 & \\$76,450 & \\$64,136 & \\$63,667\\\\\nAsset Allocation & 63 & \\$10,190 & \\$9,275 & \\$10,684 & \\$10,089 & \\$10,371 & \\$9,829 & \\$9,619 & \\$8,880\\\\\nAlternatives & 55 & \\$5,601 & \\$6,023 & \\$6,715 & \\$6,279 & \\$6,365 & \\$6,645 & \\$6,757 & \\$6,243\\\\\nCurrency & 34 & \\$311 & \\$2,014 & \\$1,665 & \\$1,743 & \\$1,683 & \\$1,666 & \\$1,722 & \\$2,058\\\\\nTOTALS & 2,329 & \\$4,082,173 & \\$3,394,217 & \\$4,195,718 & \\$4,006,620 & \\$4,065,308 & \\$3,857,151 & \\$3,817,700 & \\$3,518,986\\\\'

例如,我需要以下列表:

[Equity, Fixed Income, Commodities, Asset Allocation, Alternatives, Currency, Total]

我尝试过:

re.findall(r'\\\\\n(\w+.*?) &', rows_string)

很好,但这省略了"equity"变量 而且还给了我这个字符串变量的空列表

'Starting Portfolio & sell & 21.39\\% & -0.91\\% & 1.52\\% & 9.29\\% & 9.72\\% & 14.89\\% & 38.21\\% & 55.4\\% &  & 90.86\\%\\\\'

因此,对于第二个字符串,我需要['Starting Portfolio', 'sell'] 我想要的是抓住字符串变量中\\\\\n之后的第一项和'&'之前的第一项。谢谢

5 个答案:

答案 0 :(得分:1)

您只是缺少一个\。您不是要搜索字母\n,而是要搜索换行符。因此,只需在正则表达式的开头添加广告\。此外,您还缺少自声明以来的第一个条目,即单词以\\\\\n开头。要获得第一,您可以使用^(\w+.*?)|[\\\\\n](\w+.*?) &例如

答案 1 :(得分:1)

我认为没有理由专注于逃脱的换行符。这应该可以解决问题:

export const getUserSession = () => dispatch => {
    return AsyncStorage.getItem('userSession').then((data) => {
        console.log('Props at asynsstorage: ', data);
        // {"current_user":{"uid":"1","roles":["authenticated","administrator"], ...}
        dispatch(loading(false));
        dispatch(getSession(JSON.parse(data))); //convert to json here
    })
    .catch((err) => {
    })
}

答案 2 :(得分:1)

尝试使用re.finditer()的这种模式:

pattern = r"(((?!\\\\\\\\\\n)([a-zA-Z\\s]+))|([a-zA-Z\\s]{2,}\\s?(?!\\&)))"
output_list = [i.group().strip() for i in re.finditer(pattern, rows_string) if i.group().strip()]

输入:

s1 = 'Equity & 1,638 & \\$3,227,305 & \\$2,649,208 & \\$3,270,402 & \\$3,114,298 & \\$3,173,369 & \\$2,978,769 & \\$3,016,161 & \\$2,807,840\\\\\nFixed Income & 420 & \\$765,856 & \\$661,395 & \\$824,603 & \\$792,579 & \\$794,224 & \\$783,793 & \\$719,307 & \\$630,298\\\\\nCommodities & 119 & \\$72,911 & \\$66,302 & \\$81,649 & \\$81,633 & \\$79,296 & \\$76,450 & \\$64,136 & \\$63,667\\\\\nAsset Allocation & 63 & \\$10,190 & \\$9,275 & \\$10,684 & \\$10,089 & \\$10,371 & \\$9,829 & \\$9,619 & \\$8,880\\\\\nAlternatives & 55 & \\$5,601 & \\$6,023 & \\$6,715 & \\$6,279 & \\$6,365 & \\$6,645 & \\$6,757 & \\$6,243\\\\\nCurrency & 34 & \\$311 & \\$2,014 & \\$1,665 & \\$1,743 & \\$1,683 & \\$1,666 & \\$1,722 & \\$2,058\\\\\nTOTALS & 2,329 & \\$4,082,173 & \\$3,394,217 & \\$4,195,718 & \\$4,006,620 & \\$4,065,308 & \\$3,857,151 & \\$3,817,700 & \\$3,518,986\\\\'
s2 = 'Starting Portfolio & sell & 21.39\\% & -0.91\\% & 1.52\\% & 9.29\\% & 9.72\\% & 14.89\\% & 38.21\\% & 55.4\\% &  & 90.86\\%\\\\'*

输出

['Equity', 'Fixed Income', 'Commodities', 'Asset Allocation', 'Alternatives', 'Currency', 'TOTALS']
['Starting Portfolio', 'sell']

答案 3 :(得分:1)

要获取值,您可以使用替代方式来匹配字符串开头的单词或获取 &之前的单词

(?:^[A-Za-z]+(?: [A-Za-z]+)*|[A-Za-z]+(?: [A-Za-z]+)*(?= &))
  • (?:非捕获组
    • ^行的开头
    • [A-Za-z]+(?: [A-Za-z]+)*仅将字符A-Za-z匹配1个以上的单词
    • |
    • [A-Za-z]+(?: [A-Za-z]+)*(?= &)匹配单词,后跟 &
  • )关闭群组

Regex demo | Python demo

例如

import re

pattern = r'(?:^[A-Za-z]+(?: [A-Za-z]+)*|[A-Za-z]+(?: [A-Za-z]+)*(?= &))'
rows_string = 'Equity & 1,638 & \\$3,227,305 & \\$2,649,208 & \\$3,270,402 & \\$3,114,298 & \\$3,173,369 & \\$2,978,769 & \\$3,016,161 & \\$2,807,840\\\\\nFixed Income & 420 & \\$765,856 & \\$661,395 & \\$824,603 & \\$792,579 & \\$794,224 & \\$783,793 & \\$719,307 & \\$630,298\\\\\nCommodities & 119 & \\$72,911 & \\$66,302 & \\$81,649 & \\$81,633 & \\$79,296 & \\$76,450 & \\$64,136 & \\$63,667\\\\\nAsset Allocation & 63 & \\$10,190 & \\$9,275 & \\$10,684 & \\$10,089 & \\$10,371 & \\$9,829 & \\$9,619 & \\$8,880\\\\\nAlternatives & 55 & \\$5,601 & \\$6,023 & \\$6,715 & \\$6,279 & \\$6,365 & \\$6,645 & \\$6,757 & \\$6,243\\\\\nCurrency & 34 & \\$311 & \\$2,014 & \\$1,665 & \\$1,743 & \\$1,683 & \\$1,666 & \\$1,722 & \\$2,058\\\\\nTOTALS & 2,329 & \\$4,082,173 & \\$3,394,217 & \\$4,195,718 & \\$4,006,620 & \\$4,065,308 & \\$3,857,151 & \\$3,817,700 & \\$3,518,986\\\\'
print(re.findall(pattern, rows_string, re.M))


rows_string2 = 'Starting Portfolio & sell & 21.39\\% & -0.91\\% & 1.52\\% & 9.29\\% & 9.72\\% & 14.89\\% & 38.21\\% & 55.4\\% &  & 90.86\\%\\\\'
print(re.findall(pattern, rows_string2, re.M))

输出

['Equity', 'Fixed Income', 'Commodities', 'Asset Allocation', 'Alternatives', 'Currency', 'TOTALS']
['Starting Portfolio', 'sell']

如果所有匹配项后面都应加上 &,则可以将模式简化为

[A-Za-z]+(?: [A-Za-z]+)*(?= &)

Regex demo

答案 4 :(得分:0)

假设目标字符串(财务关键字)在换行符(或字符串的开头)之后并且在&之前,您可以这样做:

>>> re.findall(r'(?:\n|^)([A-Za-z ]+)\s&', s)
['Equity', 'Fixed Income', 'Commodities', 'Asset Allocation', 'Alternatives', 'Currency', 'TOTALS']

这使用了一些快捷方式,但是取决于您是否具有更复杂的字符串,例如“ P&E”,“ Misc。Expenses”等,以上可能就足够了。