嗨,大家好我试图使用scala regex解析http://kdd.ics.uci.edu/databases/20newsgroups/20_newsgroups.tar.gz中的一些数据
下面是我试图处理的文字:
val inputData = ""xref: cantaloupe.srv.cs.cmu.edu alt.atheism:51121 soc.motss:139944 rec.scouting:5318
newsgroups: alt.atheism,soc.motss,rec.scouting
path: cantaloupe.srv.cs.cmu.edu!crabapple.srv.cs.cmu.edu!fs7.ece.cmu.edu!europa.eng.gtefsd.com!howland.reston.ans.net!wupost!uunet!newsgate.watson.ibm.com!yktnews.watson.ibm.com!watson!watson.ibm.com!strom
from: strom@watson.ibm.com (rob strom)
subject: re: [soc.motss, et al.] "princeton axes matching funds for boy scouts"
sender: @watson.ibm.com
message-id: <1993apr05.180116.43346@watson.ibm.com>
date: mon, 05 apr 93 18:01:16 gmt
distribution: usa
references: <c47efs.3q47@austin.ibm.com> <1993mar22.033150.17345@cbnewsl.cb.att.com> <n4hy.93apr5120934@harder.ccr-p.ida.org>
organization: ibm research
lines: 15
in article <n4hy.93apr5120934@harder.ccr-p.ida.org>, n4hy@harder.ccr-p.ida.org (bob mcgwier) writes:
|> [1] however, i hate economic terrorism and political correctness
|> worse than i hate this policy.
|> [2] a more effective approach is to stop donating
|> to any organizating that directly or indirectly supports gay rights issues
|> until they end the boycott on funding of scouts.
can somebody reconcile the apparent contradiction between [1] and [2]?
--
rob strom, strom@watson.ibm.com, (914) 784-7641
ibm research, 30 saw mill river road, p.o. box 704, yorktown heights, ny 10598"
这是我需要的输出
in article <n4hy.93apr5120934@harder.ccr-p.ida.org>, n4hy@harder.ccr-p.ida.org (bob mcgwier) writes:
|> [1] however, i hate economic terrorism and political correctness
|> worse than i hate this policy.
|> [2] a more effective approach is to stop donating
|> to any organizating that directly or indirectly supports gay rights issues
|> until they end the boycott on funding of scouts.
can somebody reconcile the apparent contradiction between [1] and [2]?
这是我尝试的内容:
val docParser = """([\\s\\S]+\\lines: \\d*)([\\s\\S]*\\n\\n)([\\s\\S]*)""".r
val docParser(metadata, content, footer) = inputText
但我得到以下错误:
scala.MatchError:[Ljava.lang.String; @ 62f8fff1(类[Ljava.lang.String;]
有什么想法吗? :)
答案 0 :(得分:0)
我以前从未在scala中编程,但是从我在http://www.tutorialspoint.com/scala/scala_regular_expressions.htm中看到的内容 你必须逃脱两次像数字这样的东西。
所以\d
会在scala中成为\\d
,依此类推。