我需要一个正则表达式,它匹配任何数字,后跟一个由数字,空格,点和逗号组成的字符串,后跟“Kč”或“ Eur”。
问题是我的
<body>
<div class="container">
<div class="text">
<div class="text-row">
<span>W</span>
<span>e</span>
<span> </span>
<span>P</span>
<span>r</span>
<span>o</span>
<span>v</span>
<span>i</span>
<span>d</span>
<span>e</span>
</div>
</div>
</div>
</body>
有时找不到所有这样的字符串。
Java Monitor Blocked
at java.security.Provider.getService(String, String)
at sun.security.jca.ProviderList$ServiceList.tryGet(int)
at sun.security.jca.ProviderList$ServiceList.access$200(ProviderList$ServiceList, int)
at sun.security.jca.ProviderList$ServiceList$1.hasNext()
at javax.crypto.KeyGenerator.nextSpi(KeyGeneratorSpi, boolean)
at javax.crypto.KeyGenerator.<init>(String)
at javax.crypto.KeyGenerator.getInstance(String)
at sun.security.ssl.JsseJce.getKeyGenerator(String)
at sun.security.ssl.HandshakeMessage$Finished.getFinished(HandshakeHash, int, SecretKey)
at sun.security.ssl.HandshakeMessage$Finished.<init>(ProtocolVersion, HandshakeHash, int, SecretKey, CipherSuite)
at sun.security.ssl.ServerHandshaker.sendChangeCipherAndFinish(boolean)
at sun.security.ssl.ServerHandshaker.clientHello(HandshakeMessage$ClientHello)
at sun.security.ssl.ServerHandshaker.processMessage(byte, int)
at sun.security.ssl.Handshaker.processLoop()
at sun.security.ssl.Handshaker.process_record(InputRecord, boolean)
at sun.security.ssl.SSLSocketImpl.readRecord(InputRecord, boolean)
at sun.security.ssl.SSLSocketImpl.performInitialHandshake()
at sun.security.ssl.SSLSocketImpl.startHandshake(boolean)
at sun.security.ssl.SSLSocketImpl.startHandshake()
at org.subethamail.smtp.command.StartTLSCommand.execute(String, Session)
at org.subethamail.smtp.server.CommandHandler.handleCommand(Session, String)
at org.subethamail.smtp.server.Session.runCommandLoop()
at org.subethamail.smtp.server.Session.run()
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker)
at java.util.concurrent.ThreadPoolExecutor$Worker.run()
at java.lang.Thread.run()
例如:
val df1 = sourceData.filter($"col1" === "val" and ...)
.select(...)
.groupBy(...)
.min()
val df2 = sourceData.filter($"col2" === "val" and ...)
.select(...)
.groupBy(...)
.count()
不返回任何内容,而不是df1.join(df2, Seq("groupCol"), "full_outer")
.join(df3....)
.write.save(...)
您知道正则表达式有什么问题吗?
答案 0 :(得分:5)
您的输入字符串包含一个由基数c
和变音符号组成的多字节字母,而正则表达式包含带有Unicode代码点\u010D
的预置字母。
您可以使用
(\d(?:[., \d]*\d)?)\s*(K(?:c\u030C|\u010D)|Eur)
或
(\d[., \d]*)\s*(K(?:č|č)|Eur))
请参见regex(second regex demo)和Python demo。
模式详细信息
\d
-一个数字(?:[., \d]*\d)?
-的可选出现
[., \d]*
-零个或多个数字,空格,.
或,
\d
-一个数字\s*
-0个或多个空格(?:K(?:c\u030C|\u010D)|Eur)
-K
后跟c\u030C
或\u010D
或Eur
值。定义货币正则表达式时,请使用CZK = ['Czk','K(?:č|č)']
或CZK = ['Czk', r'K(?:c\u030C|\u010D)']
。
答案 1 :(得分:3)
正如WiktorStribiżew所说,您的正则表达式中的Kč
与文本中的Kč
不同。您可以使用unicodedata模块对两者进行标准化:
>>> import re
>>> re.findall("""((\d[., \d]+)(Kč|Eur))""", "Letenky od 12 932 Kč", flags=re.IGNORECASE)
[]
>>> import unicodedata
>>> re.findall(unicodedata.normalize("NFD", """((\d[., \d]+)(Kč|Eur))"""), unicodedata.normalize("NFD", "Letenky od 12 932 Kč"), flags=re.IGNORECASE)
[('12 932 Kč', '12 932 ', 'Kč')]