SparkR创建表RelativeFrequency

时间:2015-12-02 16:05:01

标签: r frequency sparkr

您好我正在使用SparkR。我试着计算一下我的数据的RelativeFrequency。

SmsInt<-fread("smsCallInt.txt")
setnames(SmsInt,c("V1","V2","V3","V4","V5","V6","V7","V8"),
         c("SquareID","TimeInterval","CountryCode","SmsIn","SmsOut","CallIn","CallOut","Internet"))
#Also create a dataFrame from it.
SmsInt$TimeInterval<-as.numeric(SmsInt$TimeInterval)
SmsInt.df<-createDataFrame(sqlContext,SmsInt[1:500,])

str(SmsInt)
    Classes ‘data.table’ and 'data.frame':  2459324 obs. of  8 variables:
 $ SquareID    : int  10000 10000 10000 10000 10000 10000 10000 10000 10000 10000 ...
 $ TimeInterval: num  1.38e+12 1.38e+12 1.38e+12 1.38e+12 1.38e+12 ...
 $ CountryCode : int  0 39 49 0 39 0 39 0 39 49 ...
 $ SmsIn       : num  0.109 1.001 NA 0.193 0.648 ...
 $ SmsOut      : num  NA 1.26 NA NA 1.06 ...
 $ CallIn      : num  NA 0.0876 NA NA 0.1751 ...
 $ CallOut     : num  0.0219 0.2196 NA NA 0.1532 ...
 $ Internet    : num  NA 10.1685 0.0219 NA 11.8671 ...
 - attr(*, ".internal.selfref")=<externalptr> 

我想要做的是从SmsInt $ CountryCode创建一个RelativeFrequency。 当我输入Country<-table(SmsInt$CountryCode)

我收到了这个错误:

  

Errore:class(objId)==&#34; jobj&#34;不是真的

我该怎么办?有办法手动或用一些包计算它吗?

我创建了一个算法但是我遇到了一些麻烦。

Country5<-SmsInt$CountryCode[1:90]
UniqueCountry<-unique(Country5)
VectorLen<-c()
Parsed<-c()
Freq<-c()
for(i in 1:length(UniqueCountry)){
    CountryCode.i<-UniqueCountry[i]
    if(CountryCode.i %in% Parsed){
        Vector<-0
        VectorLen[i]<-0
        Freq[i]<-0
    }
    else{
        Vector<-grep(CountryCode.i,Country5)
        Parsed[i]<-CountryCode.i
        VectorLen[i]<-length(Vector)
        Freq[i]<-VectorLen[i]/90
        Vector<-0
    }
}
Vector
VectorLen #92 it needs to be 90
Freq
sum(Freq) #1.022222 needs to be 1

有80件作品。

1 个答案:

答案 0 :(得分:1)

好的,我做到了。错误是grep函数,所以当我查找数字1时,例如在10号上找到它。

我在这里发布解决方案。

RelativeFrequency<-function(DataSet){
  UniqueCountry<-unique(DataSet)
  VectorLen<-c()
  Parsed<-c()
  Freq<-c()
  for(i in 1:length(UniqueCountry)){
    CountryCode.i<-UniqueCountry[i]
    if(CountryCode.i %in% Parsed){
      Vector<-0
      VectorLen[i]<-0
      Freq[i]<-0
    }
    else{
      Vector<-which(DataSet %in% CountryCode.i) 
      Parsed[i]<-CountryCode.i
      VectorLen[i]<-length(Vector)
      Freq[i]<-VectorLen[i]/length(DataSet)
    }
  }
  print("Vector of RelativeFrequency")
  print(Freq)
  print("Frequency Sum (Needs to be 1)")
  print(sum(Freq))
  print("Parsed element ")
  print(Parsed)
  barplot(Freq,names=Parsed,space = 0.7,axisnames = TRUE,las=2)
}