在R中排序和查找矢量的前5个值

时间:2014-05-02 05:39:50

标签: r

我有一个已写入的.R文件只是通过CSV文件并绘制数据。我试图添加一些线来抓取并输出基于频率的前5行,但我得到奇怪的结果

以下是代码:

    require (stringr)

    generate_names <-function ( gender, name) {

      genderfn<-paste(gender,"_names.csv",sep="",collapse=NULL)
      fn <- paste("../datasets/Ontario_names/", genderfn,sep="",collapse = NULL)

      A <- read.csv(fn, skip=1, header=TRUE)
      print(dim(A))
      # Recode the Frequency measurement to be certain it is an integer
      A$Frequency <- as.integer(A$Frequency)

      #pdf(paste(name, ".pdf",sep="", collapse =NULL))

      #generate a logical vector of matching names
      g <- stringr::str_trim(A$Name)==toupper(name)

      #use the logical vector to create a smaller data frame
      name.df <- A[g,]

      #my little addition
      ordered <- name.df[order(A$Frequency, decreasing = F),]
      top5 <- head( ordered, 50)
      print(top5)


      #plot the distribution of name registrations over years
      plot(name.df$Year,name.df$Frequency,
           type="p",
           main=paste(toupper(name)," in Ontario"), 
           xlab="Birth Year", ylab = "Number",
           xlim=c(min(name.df$Year),max(name.df$Year)),
           ylim=c(0,max(name.df$Frequency)) )
      #grid()
      #dev.off()
    }

    # Replace the gender and names and try some different names
    generate_names("male","grant")
    generate_names("female","mary")

输出有点奇怪。这些是两个函数的片段:

    > generate_names("male","grant")
    [1] 66351     3
          Year  Name Frequency
    26720 1917 GRANT        25
    26729 1926 GRANT        36
    26733 1930 GRANT        36
    26734 1931 GRANT        33
    26735 1932 GRANT        36
    26737 1934 GRANT        47
    26738 1935 GRANT        45
    26740 1937 GRANT        43
    26741 1938 GRANT        46
    26743 1940 GRANT        51
    26744 1941 GRANT        67
    26765 1962 GRANT       157
    26771 1968 GRANT       132
    26774 1971 GRANT        93
    26776 1973 GRANT        89
    26783 1980 GRANT        69
    NA      NA  <NA>        NA
    NA.1    NA  <NA>        NA
    NA.2    NA  <NA>        NA
    NA.3    NA  <NA>        NA

    > generate_names("female","mary")
    [1] 83035     3
          Year Name Frequency
    57032 1955 MARY       572
    57060 1983 MARY       579
    57063 1986 MARY       390
    NA      NA <NA>        NA
    NA.1    NA <NA>        NA
    NA.2    NA <NA>        NA
    NA.3    NA <NA>        NA

每个输出顶部的那些行甚至在频率方面都不是最高的。

0 个答案:

没有答案