根据Column中的条件过滤数据表

时间:2016-05-29 19:27:50

标签: r

我正在尝试从NSE site下载EOD数据。数据包括所有类型的EQ或BE或DR或N1等。现在我想根据EQ和BE和DR过滤表格,并排除Col“系列”中的其他字段。

读写后的数据结构就像这样

      DATE SERIES     SYMBOL     OPEN     HIGH      LOW    CLOSE   VOLUME
1    2016-05-27     EQ  20MICRONS    28.30    29.20    28.05    28.25    31468
2    2016-05-27     EQ 3IINFOTECH     4.20     4.25     3.90     3.95  2209977
3    2016-05-27     EQ    3MINDIA 13170.00 13300.00 12611.00 12699.00     5511
4    2016-05-27     EQ    8KMILES  1717.00  1770.95  1685.00  1710.45    33558
5    2016-05-27     EQ   A2ZINFRA    24.80    25.65    24.70    25.15   102189
6    2016-05-27     EQ AARTIDRUGS   458.05   473.85   458.05   468.95    11140
7    2016-05-27     EQ   AARTIIND   512.60   519.95   512.20   516.20    13101
8    2016-05-27     EQ  AARVEEDEN    58.00    59.00    57.20    58.55     3436
9    2016-05-27     EQ       ABAN   198.55   202.50   198.50   199.55   999288
10   2016-05-27     EQ        ABB  1241.80  1273.85  1234.40  1253.95    51180
11   2016-05-27     EQ ABBOTINDIA  4703.00  4764.00  4639.70  4751.70     2663
12   2016-05-27     EQ      ABFRL   137.80   141.00   133.50   134.50   541872

尝试使用which命令但仅返回EQ系列

使用的代码是

#28-10-2014: Fix for '403 Forbidden'
## Credit http://stackoverflow.com/questions/26086868/error-downloading-a-csv-in-zip-from-website-with-get-in-r

library(httr)

#Define Working Directory, where files would be saved
setwd('D:/FII Stats/')

Define start and end dates, and convert them into date format
startDate = as.Date("2016-05-26", order="ymd")
endDate =   as.Date("2016-05-27", order="ymd")

#work with date, month, year for which data has to be extracted
myDate = startDate
zippedFile <- tempfile() 

while (myDate <= endDate){
  filenameDate = paste(as.character(myDate, "%y%m%d"), ".csv", sep = "")
 monthfilename=paste(as.character(myDate, "%y%m"),".csv", sep = "")
 downloadfilename=paste("cm", toupper(as.character(myDate, "%d%b%Y")), "bhav.csv", sep = "")
 temp =""

  #Generate URL
 myURL = paste("http://www.nseindia.com/content/historical/EQUITIES/", as.character(myDate, "%Y"), "/", toupper(as.character(myDate, "%b")), "/", downloadfilename, ".zip", sep = "")

  #retrieve Zipped file
  tryCatch({
  #Download Zipped File

#28-10-2014: Fix for '403 Forbidden'
  #download.file(myURL,zippedFile, quiet=TRUE, mode="wb",cacheOK=TRUE)
  GET(myURL, user_agent("Mozilla/5.0"), write_disk(paste(downloadfilename,".zip",sep="")))


  #Unzip file and save it in temp 
  #28-10-2014: Fix for '403 Forbidden'
  temp <- read.csv(unzip(paste(downloadfilename,".zip",sep="")), sep = ",",as.is=TRUE) 

  #temp <-  temp[which(temp$SERIES=="EQ" | "DR" | "BE"), ]


  #Rename Columns Volume and Date
  colnames(temp)[9] <- "VOLUME"
  colnames(temp)[11] <- "DATE"

  #Define Date format
  temp$DATE <- as.Date(temp$DATE, format="%d-%b-%Y")

  #Reorder Columns and Select relevant columns
   temp<-subset(temp,select=c("DATE","SERIES","SYMBOL","OPEN","HIGH","LOW","CLOSE","VOLUME"))
   #temp<-subset(temp,temp[temp$"SERIES" == "BE & DR & EQ", ],select=c("DATE","SYMBOL","OPEN","HIGH","LOW","CLOSE","LAST","VOLUME"))

  #Write the BHAVCOPY csv - datewise
  write.csv(temp,file=filenameDate,row.names = FALSE)

  #Write the csv in Monthly file
  if (file.exists(monthfilename))
  {
   write.table(temp,file=monthfilename,sep=",", eol="\n", row.names = FALSE, col.names = FALSE, append=TRUE)
  }else
  {
   write.table(temp,file=monthfilename,sep=",", eol="\n", row.names = FALSE, col.names = TRUE, append=FALSE)
  }


  #Print Progress
  #print(paste (myDate, "-Done!", endDate-myDate, "left"))
 }, error=function(err){
  #print(paste(myDate, "-No Record"))
 }
 )
  myDate <- myDate+1
  print(paste(myDate, "Next Record"))
}

 #Delete temp file - Bhavcopy
 junk <- dir(pattern="cm")
 file.remove(junk)

如何获得理想的结果?

2 个答案:

答案 0 :(得分:2)

使用%in%而不是&#34; ==&#34;。您无法使用x == A | B,但可以使用x %in% c("A","B")。如果您选择使用&#34; [&#34;。请不要使用子集。这是一种或两种选择:

temp <- temp[ temp$"SERIES" %in% c("BE",  "DR", "EQ") ,   # row selection rule
             c("DATE","SYMBOL","OPEN","HIGH","LOW","CLOSE","LAST","VOLUME") ] #col select

或者以这种方式使用subset

temp<-subset(temp,   SERIES %in% c("BE",  "DR", EQ"),   # NSE , so use unquoted colname
               select=c("DATE","SYMBOL", "OPEN", "HIGH", "LOW", "CLOSE", "LAST", "VOLUME"))

可能更好地使用&#34; [&#34;如果您计划使用R进行任何编程,则会起作用。在subset中,NSE(如果您不知道缩写词的含义,请查找)是持续错误的来源。最安全的是避免使用&#39; $&#39;以及:

temp <- temp[ temp[["SERIES"]] %in% c("BE,  "DR", EQ") ,   # row selection rule
             c("DATE","SYMBOL","OPEN","HIGH","LOW","CLOSE","LAST","VOLUME") ] # col select

答案 1 :(得分:1)

这将完成工作:

library(data.table)

output <- setDT(df)[SERIES %in% c("EQ", "BE", "DR") ]