有人告诉我,我的代码有什么问题: 下面是我在scala中的火花代码:
import java.text.SimpleDateFormat
import org.apache.spark.sql.SparkSession
import scala.xml.XML
object TopTenTags09 {
def main(args:Array[String]){
val format = new SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss.SSS")
val format2 = new SimpleDateFormat("yyyy-MM")
val spark = SparkSession.builder().appName("Number of posts which are questions and contains specified words").master("local").getOrCreate()
val data = spark.read.textFile("/home/harsh/Hunny/HadoopPractice/Spark/DF/StackOverFlow/Posts.xml").rdd
val result = data.filter{line=>{line.trim().startsWith("<row")}}
.filter{line=>{line.contains("PostTypeId=\"1\"")}}
.map { line=>{
val xml = XML.loadString(line)
if(xml.attribute("Tags").mkString.toLowerCase().contains("hadoop") ||
xml.attribute("Tags").mkString.toLowerCase().contains("spark")){
(Integer.parseInt(xml.attribute("Score").toString()),Integer.parseInt(xml.attribute("Score").toString()))
}
}}/*.filter(line=>line._1>2)
.sortByKey(false)*/
result.foreach(println) //throwing error while printing
spark.stop
}
}
以下是我在运行时遇到的错误:
java.lang.NumberFormatException: For input string: "Some(12)"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:580)
at java.lang.Integer.parseInt(Integer.java:615)
我是新来的火花,错误让我发疯,因为正如错误提到的那样,没有&#34;有些&#34;在代码或数据中。任何人都可以帮助我。 样本数据
<row Id="5" PostTypeId="1" CreationDate="2014-05-13T23:58:30.457" Score="7" ViewCount="286" Body="<p>I've always been interested in machine learning, but I can't figure out one thing about starting out with a simple "Hello World" example - how can I avoid hard-coding behavior?</p>

<p>For example, if I wanted to "teach" a bot how to avoid randomly placed obstacles, I couldn't just use relative motion, because the obstacles move around, but I don't want to hard code, say, distance, because that ruins the whole point of machine learning.</p>

<p>Obviously, randomly generating code would be impractical, so how could I do this?</p>
" OwnerUserId="5" LastActivityDate="2014-05-14T00:36:31.077" Title="How can I do simple machine learning without hard-coding behavior?" Tags="<machine-learning>" AnswerCount="1" CommentCount="1" FavoriteCount="1" ClosedDate="2014-05-14T14:40:25.950" />
<row Id="7" PostTypeId="1" AcceptedAnswerId="10" CreationDate="2014-05-14T00:11:06.457" Score="2" ViewCount="266" Body="<p>As a researcher and instructor, I'm looking for open-source books (or similar materials) that provide a relatively thorough overview of data science from an applied perspective. To be clear, I'm especially interested in a thorough overview that provides material suitable for a college-level course, not particular pieces or papers.</p>
" OwnerUserId="36" LastEditorUserId="97" LastEditDate="2014-05-16T13:45:00.237" LastActivityDate="2014-05-16T13:45:00.237" Title="What open-source books (or other materials) provide a relatively thorough overview of data science?" Tags="<education><open-source>" AnswerCount="3" CommentCount="4" FavoriteCount="1" ClosedDate="2014-05-14T08:40:54.950" />
<row Id="9" PostTypeId="2" ParentId="5" CreationDate="2014-05-14T00:36:31.077" Score="4" Body="<p>Not sure if this fits the scope of this SE, but here's a stab at an answer anyway.</p>

<p>With all AI approaches you have to decide what it is you're modelling and what kind of uncertainty there is. Once you pick a framework that allows modelling of your situation, you then see which elements are "fixed" and which are flexible. For example, the model may allow you to define your own network structure (or even learn it) with certain constraints. You have to decide whether this flexibility is sufficient for your purposes. Then within a particular network structure, you can learn parameters given a specific training dataset.</p>

<p>You rarely hard-code behavior in AI/ML solutions. It's all about modelling the underlying situation and accommodating different situations by tweaking elements of the model.</p>

<p>In your example, perhaps you might have the robot learn how to detect obstacles (by analyzing elements in the environment), or you might have it keep track of where the obstacles were and which way they were moving.</p>
" OwnerUserId="51" LastActivityDate="2014-05-14T00:36:31.077" CommentCount="0" />
<row Id="10" PostTypeId="2" ParentId="7" CreationDate="2014-05-14T00:53:43.273" Score="9" Body="<p>One book that's freely available is "The Elements of Statistical Learning" by Hastie, Tibshirani, and Friedman (published by Springer): <a href="http://statweb.stanford.edu/~tibs/ElemStatLearn/">see Tibshirani's website</a>.</p>

<p>Another fantastic source, although it isn't a book, is Andrew Ng's Machine Learning course on Coursera. This has a much more applied-focus than the above book, and Prof. Ng does a great job of explaining the thinking behind several different machine learning algorithms/situations.</p>
" OwnerUserId="22" LastActivityDate="2014-05-14T00:53:43.273" CommentCount="1" />
<row Id="14" PostTypeId="1" CreationDate="2014-05-14T01:25:59.677" Score="14" ViewCount="686" Body="<p>I am sure data science as will be discussed in this forum has several synonyms or at least related fields where large data is analyzed.</p>

<p>My particular question is in regards to Data Mining. I took a graduate class in Data Mining a few years back. What are the differences between Data Science and Data Mining and in particular what more would I need to look at to become proficient in Data Mining?</p>
" OwnerUserId="66" LastEditorUserId="322" LastEditDate="2014-06-17T16:17:20.473" LastActivityDate="2014-06-20T17:36:05.023" Title="Is Data Science the Same as Data Mining?" Tags="<data-mining><definitions>" AnswerCount="4" CommentCount="1" FavoriteCount="2" />
答案 0 :(得分:2)
我认为
(Integer.parseInt(xml.attribute("Score").toString())
抛出上述异常,因为xml
的类型为Elem,如果您在其上调用方法attribute
,则会返回Option[Seq[Node]]
,而不只是带有数字的单个字符串。
您可能希望通过
替换上述类型的两个部分(Integer.parseInt(xml.attribute("Score").get.toString())
此外,您还可以通过
替换繁琐的Integer.parseInt
xml.attribute("Score").get.toString.toInt
隔离演示:
scala> val e = XML.loadString("""<foo Score="42" Bar="58"/>""")
e: scala.xml.Elem = <foo Bar="58" Score="42"/>
scala> e.attribute("Score").get.toString.toInt
res4: Int = 42