如何在Scala中过滤嵌套的列表和地图

时间:2018-09-10 12:55:00

标签: json scala dictionary filter

我正在尝试读取json文件,以便在scala中计算一些指标。我设法读取了文件并进行了一些外部过滤,但我在理解如何过滤嵌套列表和地图时遇到了麻烦。

这是示例代码(真正的json更长):

  val rawData = """[
  {
    "technology": "C",
    "users": [
    {
      "rating": 5,
      "completed": false,
      "user": {
        "id": 11111,
        "paid": true
      }
    },
    {
      "rating": 4,
      "completed": false,
      "user": {
        "id": 22222,
        "paid": false
      }
    }
    ],
    "title": "CS50"
  },
  {
    "technology": "C++",
    "users": [
    {
      "rating": 3,
      "completed": true,
      "user": {
        "id": 33333,
        "paid": false
      }
    },
    {
      "rating": 5,
      "completed": true,
      "user": {
        "id": 44444,
        "paid": false
      }
    }
    ],
    "title": "Introduction to C++"
  },
  {
    "technology": "Haskell",
    "users": [
    {
      "rating": 5,
      "completed": false,
      "user": {
        "id": 55555,
        "paid": false
      }
    },
    {
      "rating": null,
      "completed": true,
      "user": {
        "id": 66666,
        "paid": false
      }
    }
    ],
    "title": "Course on Haskell"
  }
  ]"""

  val data = rawData.toString.split("\n").toSeq.map(_.trim).filter(_ != "").mkString("")

我设法得到包含3个标题的列表:

import scala.util.parsing.json._
val parsedData = JSON.parseFull(data)
val listTitles = parsedData.get.asInstanceOf[List[Map[String, Any]]].map( { case e: Map[String, Any] => e("title").toString }  )

这是我的三个问题:

  1. 获取这三个书名的列表是一种好方法吗?
  2. 如何获取包含每个用户的付费用户数量的列表 后3个标题?
  3. 如何获取包含以下用户数量的列表 完成了后三个标题中的每一个的课程?

预先感谢您的帮助

3 个答案:

答案 0 :(得分:1)

作为另一个答案的建议,您应该使用play-json库。它功能强大,并具有大量功能,包括对象映射,解析和错误处理。

  import play.api.libs.json._
  import play.api.data.validation.ValidationError

  case class User(id: String, paid: Boolean)
  object User {
    implicit val format: OFormat[User] = Json.format[User]
  }

  case class UserCourseStat(rating: Int, completed: Boolean, user: User)
  object UserCourseStat {
    implicit val format: OFormat[UserCourseStat] = Json.format[UserCourseStat]
  }

  case class Data(technology: String, title: String, users: List[UserCourseStat])
  object Data {
    implicit val format: OFormat[Data] = Json.format[Data]
  }

  val jsString = """[{"technology":"C","users":[{"rating":5,"completed":false,"user":{"id":11111,"paid":true}},{"rating":4,"completed":false,"user":{"id":22222,"paid":false}}],"title":"CS50"},{"technology":"C++","users":[{"rating":3,"completed":true,"user":{"id":33333,"paid":false}},{"rating":5,"completed":true,"user":{"id":44444,"paid":false}}],"title":"Introduction to C++"},{"technology":"Haskell","users":[{"rating":5,"completed":false,"user":{"id":55555,"paid":false}},{"rating":null,"completed":true,"user":{"id":66666,"paid":false}}],"title":"Course on Haskell"}]"""

  val rowData: JsValue = Json.parse(jsString)

  rowData.validate[List[Data]] match {
    case JsSuccess(dataList: List[Data], _) =>
      val chosenTitles = List("Course on Haskell", "Introduction to C++", "CS50")

      //map of each chosen title to sequence of it's users
      val chosenTitleToUsersMap = chosenTitles.map { title =>
        title -> dataList.filter(_.title == title)
          .flatMap(_.users.map(_.user))
          .toSet
      }.toMap
      //map of each chosen title to sequence of it's paid users
      val chosenTitleToPaidUsersMap = chosenTitleToUsersMap.map { case (title, users) =>
        title -> users.filter(_.paid)
      }

      //Calculate users who have completed each of the chosen title
      val allUsers = dataList.flatMap(_.users.map(_.user)).toSet

      val usersWhoCompletedAllChosenTitles = allUsers.filter{ user =>
        chosenTitles.forall { title =>
          chosenTitleToUsersMap.get(title).flatten.contains(user)
        }
      }

    case JsError(errors: Seq[(JsPath, Seq[ValidationError])]) =>
      //handle the error case
      ???
  }

关于您的3个问题:

  
      
  1. 获取这三个书名的列表是一种好方法吗?
  2.   

我在那里看到2个不安全的操作,asInstanceOf和e(“ title”),后一个是因为未使用Map的.get(key)方法,如果找不到键,它将抛出异常。 >

  
      
  1. 如何获取包含后3个标题中每个标题的付费用户数量的列表?
  2.   

在上面的名为“ chosenTitleToPaidUsersMap”的值中进行了评估

  
      
  1. 如何获取包含后三个标题中的每一个都已完成课程的用户数的列表?
  2.   

在上面的名为“ usersWhoCompletedAllChosenTitles”的值中评估

答案 1 :(得分:0)

您可以使用play-json库来解析和检索所需的字段。例如:

import play.api.libs.json.Json

val rawData1 = Json.parse("""[{"technology":"C","users":[{"rating":5,"completed":false,"user":{"id":11111,"paid":true}},{"rating":4,"completed":false,"user":{"id":22222,"paid":false}}],"title":"CS50"},{"technology":"C++","users":[{"rating":3,"completed":true,"user":{"id":33333,"paid":false}},{"rating":5,"completed":true,"user":{"id":44444,"paid":false}}],"title":"Introduction to C++"},{"technology":"Haskell","users":[{"rating":5,"completed":false,"user":{"id":55555,"paid":false}},{"rating":null,"completed":true,"user":{"id":66666,"paid":false}}],"title":"Course on Haskell"}]""")

val resultedList = (rawData1 \\ "title").toList.map(_.as[String])

答案 2 :(得分:0)

我建议您使用json4s库。它允许您将数据提取到案例类中:

import org.json4s.jackson.JsonMethods.parseOpt
import org.json4s.DefaultFormats
implicit val formats = DefaultFormats

case class Tech(technology: String, users: Seq[TechUser], title: String)
case class TechUser(rating: Option[Int], completed: Boolean, user: UserInfo)
case class UserInfo(id: Int, paid: Boolean)

val rawData = """..."""
val Some(json) = parseOpt(rawData)
val Some(data) = json.extractOpt[List[Tech]]

完成此操作后,data是常规的Scala数据结构,您可以根据需要对其进行操作。例如,如果要查找哪个用户的ID被5整除的用户,则可以这样做:

data.find(_.users.exists(_.user.id % 5 == 0)).map(_.title)
// Result: Some("Course on Haskell")

您对这三个问题的答案就像这样,只是一线而已,但是我作为练习来留给您。