Boxplot仅用于满足R中条件的值

时间:2016-10-27 20:39:02

标签: r boxplot

我有this dataset。我想制作只有那些索引在“电影”栏中出现67次的电影的并排箱图。以下代码告诉我在“movie”列中出现67次的索引:

names(which(table(votes$movie) == 67))

但是,我怎样才能为这些指数的“评级”制作并排的箱图?我怎样才能在每个箱图上添加平均值作为单个点?

我试过了:

boxplot(votes$rating[which(table(votes$movie) == 67)])

但这显然是错误的,因为它只显示了一个箱图

MRE:

# set.seed(1)
# votes2 <- votes[sample(1:nrow(votes), 100, TRUE), ]

votes2 <- 
  structure(list(user = c(869L, 620L, 42L, 341L, 930L, 267L, 708L,934L, 148L, 385L, 251L, 181L, 313L, 437L, 747L, 260L, 109L, 201L,229L, 366L, 921L, 829L, 934L, 868L, 321L, 226L, 527L, 726L, 26L,457L, 117L, 325L, 327L, 60L, 804L, 158L, 593L, 200L, 880L, 482L,868L, 339L, 328L, 347L, 100L, 896L, 846L, 676L, 357L, 496L, 541L,807L, 257L, 924L, 894L, 478L, 601L, 13L, 311L, 230L, 435L, 654L,742L, 180L, 887L, 201L, 147L, 326L, 749L, 465L, 727L, 200L, 216L,267L, 345L, 445L, 268L, 26L, 366L, 82L, 763L, 436L, 324L, 707L,802L, 280L, 682L, 343L, 826L, 325L, 508L, 618L, 405L, 655L, 645L,378L, 296L, 438L, 450L, 151L), movie = c(181L, 240L, 410L, 948L,143L, 926L, 1054L, 502L, 474L, 47L, 147L, 125L, 527L, 249L, 659L,319L, 576L, 1426L, 245L, 672L, 1028L, 151L, 492L, 90L, 182L,250L, 7L, 248L, 841L, 222L, 307L, 434L, 318L, 132L, 746L, 510L,692L, 79L, 585L, 269L, 739L, 485L, 679L, 386L, 347L, 686L, 12L,303L, 597L, 532L, 304L, 820L, 285L, 173L, 52L, 71L, 208L, 333L,504L, 266L, 961L, 195L, 294L, 216L, 491L, 179L, 304L, 655L, 62L,855L, 222L, 756L, 226L, 217L, 303L, 902L, 825L, 255L, 671L, 1128L,283L, 568L, 259L, 212L, 646L, 144L, 566L, 88L, 174L, 99L, 172L,44L, 482L, 863L, 674L, 696L, 292L, 269L, 722L, 443L), rating = c(3L,5L, 3L, 3L, 2L, 2L, 3L, 4L, 5L, 4L, 3L, 3L, 4L, 5L, 4L, 2L, 3L,2L, 3L, 5L, 4L, 4L, 4L, 3L, 3L, 4L, 5L, 2L, 2L, 5L, 5L, 5L, 5L,4L, 4L, 3L, 3L, 5L, 1L, 4L, 2L, 5L, 2L, 1L, 4L, 3L, 5L, 4L, 4L,5L, 4L, 3L, 5L, 5L, 4L, 3L, 4L, 3L, 4L, 4L, 1L, 4L, 3L, 5L, 2L,5L, 5L, 5L, 3L, 4L, 3L, 3L, 3L, 4L, 4L, 4L, 3L, 3L, 5L, 1L, 4L,5L, 5L, 4L, 4L, 2L, 3L, 4L, 5L, 5L, 5L, 4L, 3L, 3L, 3L, 3L, 5L,4L, 5L, 5L),
timestamp = structure(c(884490825, 889987954, 881110483,890758169, 879535462, 878970785, 877326158, 891194539, 877019882,879441982, 886272319, 878962816, 891013525, 880142027, 888639175,890618198, 880580663, 884114015, 891632385, 888858078, 879380142,891990672, 891192087, 877109874, 879439679, 883890491, 879456162,889832422, 891380200, 882392853, 880124339, 891478376, 887820828,883325944, 879444890, 880134296, 886193724, 884128499, 880175050,887643096, 877111542, 891032413, 885049460, 881654846, 891375212,887159146, 883947777, 892685403, 878952080, 876072633, 883864207,892532068, 882049950, 885458060, 882404507, 889388790, 876350017,881514810, 884364873, 880484286, 884133635, 887864350, 881005590,877128388, 881379566, 884114471, 885593942, 879875432, 878849052,883531444, 883709350, 876042493, 880244803, 878973760, 884900448,891200870, 875742893, 891377609, 888857990, 884714361, 878915600,887769416, 880575107, 886286792, 875986155, 891700514, 888519260,876405130, 885690481, 891479244, 883767157, 891308791, 885544739,887473995, 892054402, 880045044, 884196057, 879867960, 882471524,879524947), class = c("POSIXct","POSIXt"), tzone = "")),
.Names = c("user","movie", "rating", "timestamp"), row.names = c(26551L, 37213L,57286L, 90821L, 20169L, 89839L, 94468L, 66080L, 62912L, 6179L,20598L, 17656L, 68703L, 38411L, 76985L, 49770L, 71762L, 99191L,38004L, 77745L, 93471L, 21215L, 65168L, 12556L, 26723L, 38612L,1340L, 38239L, 86970L, 34035L, 48209L, 59957L, 49355L, 18622L,82738L, 66847L, 79424L, 10795L, 72372L, 41128L, 82095L, 64707L,78294L, 55304L, 52972L, 78936L, 2334L, 47724L, 73232L, 69274L,47762L, 86121L, 43810L, 24480L, 7068L, 9947L, 31628L, 51864L,66201L, 40684L, 91288L, 29361L, 45907L, 33240L, 65088L, 25802L,47855L, 76632L, 8425L, 87533L, 33908L, 83945L, 34669L, 33378L,47636L, 89220L, 86434L, 38999L, 77733L, 96062L, 43466L, 71252L,40000L, 32536L, 75709L, 20270L, 71113L, 12170L, 24549L, 14331L,23963L, 5894L, 64229L, 87627L, 77892L, 79731L, 45528L, 41009L,
81088L, 60494L), class = "data.frame")

names(which(table(votes2$movie) == 2))
# [1] "222" "269" "303" "304"

boxplot(votes2$rating[which(table(votes2$movie) == 2)])

enter image description here

1 个答案:

答案 0 :(得分:1)

或许,正如我所理解的那样,这个请求是针对评分的并排箱图,其中投票的数量正好是67:

<!DOCTYPE html>
<html>

<head>
  <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.7/css/bootstrap.min.css">
  <style type="text/css">
    body {
      font-family: futura;
    }
    #inp {
      width: 83.5px;
      position: relative;
      padding: 0;
      margin: 0;
      left: calc(50vw - 41.75px);
    }
    ::-webkit-file-upload-button {
      padding: 0;
      margin: 0;
      display: block;
    }

    .jumbotron {
      margin: auto;
      text-align: center;
    }
    img {
      width: 150px;
      height: 150px;
      border-radius: 50%;
    }
  </style>
</head>

<body>
  <ul class="nav nav-tabs">
    <li role="presentation"><a href="file:///Users/Programmer/Desktop/Hangawt.html">Home</a>
    </li>
    <li role="presentation" class="active"><a href="#">Profile</a>
    </li>
    <li role="presentation"><a href="file:///Users/Programmer/Desktop/Hangawt_request.html">Hangawts <span class="badge">0</span></a>
    </li>
  </ul>
  <div class="jumbotron">
    <h1>Welcome To Your Profile!</h1>
    <br>
    <h3>Your Current Profile Picture:</h3>
    <br>
    <br>
    <img src="http://placehold.it/350x150">
    <br>
    <br>
    <br>
    <input id="inp" type="file" accept="image/*">

  </div>
</body>

</html>

在我的第一次猜测正确的公式中切换顺序,但测试应该在您的示例中更成功:

boxplot( movie ~ rating,  data=votes, 
                subset = movie %in% names( table(votes$movie) == 67)), 'rating' ])

enter image description here

您应该在rhelp和SO上搜索,以便在箱线图上绘制类别均值的点或文本。很确定以前曾经问过这个问题。如果不成功,则报告所使用的搜索字词。