我正在尝试从NSE site下载EOD数据。数据包括所有类型的EQ或BE或DR或N1等。现在我想根据EQ和BE和DR过滤表格,并排除Col“系列”中的其他字段。
读写后的数据结构就像这样
DATE SERIES SYMBOL OPEN HIGH LOW CLOSE VOLUME
1 2016-05-27 EQ 20MICRONS 28.30 29.20 28.05 28.25 31468
2 2016-05-27 EQ 3IINFOTECH 4.20 4.25 3.90 3.95 2209977
3 2016-05-27 EQ 3MINDIA 13170.00 13300.00 12611.00 12699.00 5511
4 2016-05-27 EQ 8KMILES 1717.00 1770.95 1685.00 1710.45 33558
5 2016-05-27 EQ A2ZINFRA 24.80 25.65 24.70 25.15 102189
6 2016-05-27 EQ AARTIDRUGS 458.05 473.85 458.05 468.95 11140
7 2016-05-27 EQ AARTIIND 512.60 519.95 512.20 516.20 13101
8 2016-05-27 EQ AARVEEDEN 58.00 59.00 57.20 58.55 3436
9 2016-05-27 EQ ABAN 198.55 202.50 198.50 199.55 999288
10 2016-05-27 EQ ABB 1241.80 1273.85 1234.40 1253.95 51180
11 2016-05-27 EQ ABBOTINDIA 4703.00 4764.00 4639.70 4751.70 2663
12 2016-05-27 EQ ABFRL 137.80 141.00 133.50 134.50 541872
尝试使用which
命令但仅返回EQ系列
使用的代码是
#28-10-2014: Fix for '403 Forbidden'
## Credit http://stackoverflow.com/questions/26086868/error-downloading-a-csv-in-zip-from-website-with-get-in-r
library(httr)
#Define Working Directory, where files would be saved
setwd('D:/FII Stats/')
Define start and end dates, and convert them into date format
startDate = as.Date("2016-05-26", order="ymd")
endDate = as.Date("2016-05-27", order="ymd")
#work with date, month, year for which data has to be extracted
myDate = startDate
zippedFile <- tempfile()
while (myDate <= endDate){
filenameDate = paste(as.character(myDate, "%y%m%d"), ".csv", sep = "")
monthfilename=paste(as.character(myDate, "%y%m"),".csv", sep = "")
downloadfilename=paste("cm", toupper(as.character(myDate, "%d%b%Y")), "bhav.csv", sep = "")
temp =""
#Generate URL
myURL = paste("http://www.nseindia.com/content/historical/EQUITIES/", as.character(myDate, "%Y"), "/", toupper(as.character(myDate, "%b")), "/", downloadfilename, ".zip", sep = "")
#retrieve Zipped file
tryCatch({
#Download Zipped File
#28-10-2014: Fix for '403 Forbidden'
#download.file(myURL,zippedFile, quiet=TRUE, mode="wb",cacheOK=TRUE)
GET(myURL, user_agent("Mozilla/5.0"), write_disk(paste(downloadfilename,".zip",sep="")))
#Unzip file and save it in temp
#28-10-2014: Fix for '403 Forbidden'
temp <- read.csv(unzip(paste(downloadfilename,".zip",sep="")), sep = ",",as.is=TRUE)
#temp <- temp[which(temp$SERIES=="EQ" | "DR" | "BE"), ]
#Rename Columns Volume and Date
colnames(temp)[9] <- "VOLUME"
colnames(temp)[11] <- "DATE"
#Define Date format
temp$DATE <- as.Date(temp$DATE, format="%d-%b-%Y")
#Reorder Columns and Select relevant columns
temp<-subset(temp,select=c("DATE","SERIES","SYMBOL","OPEN","HIGH","LOW","CLOSE","VOLUME"))
#temp<-subset(temp,temp[temp$"SERIES" == "BE & DR & EQ", ],select=c("DATE","SYMBOL","OPEN","HIGH","LOW","CLOSE","LAST","VOLUME"))
#Write the BHAVCOPY csv - datewise
write.csv(temp,file=filenameDate,row.names = FALSE)
#Write the csv in Monthly file
if (file.exists(monthfilename))
{
write.table(temp,file=monthfilename,sep=",", eol="\n", row.names = FALSE, col.names = FALSE, append=TRUE)
}else
{
write.table(temp,file=monthfilename,sep=",", eol="\n", row.names = FALSE, col.names = TRUE, append=FALSE)
}
#Print Progress
#print(paste (myDate, "-Done!", endDate-myDate, "left"))
}, error=function(err){
#print(paste(myDate, "-No Record"))
}
)
myDate <- myDate+1
print(paste(myDate, "Next Record"))
}
#Delete temp file - Bhavcopy
junk <- dir(pattern="cm")
file.remove(junk)
如何获得理想的结果?
答案 0 :(得分:2)
使用%in%而不是&#34; ==&#34;。您无法使用x == A | B
,但可以使用x %in% c("A","B")
。如果您选择使用&#34; [&#34;。请不要使用子集。这是一种或两种选择:
temp <- temp[ temp$"SERIES" %in% c("BE", "DR", "EQ") , # row selection rule
c("DATE","SYMBOL","OPEN","HIGH","LOW","CLOSE","LAST","VOLUME") ] #col select
或者以这种方式使用subset
:
temp<-subset(temp, SERIES %in% c("BE", "DR", EQ"), # NSE , so use unquoted colname
select=c("DATE","SYMBOL", "OPEN", "HIGH", "LOW", "CLOSE", "LAST", "VOLUME"))
可能更好地使用&#34; [&#34;如果您计划使用R进行任何编程,则会起作用。在subset
中,NSE(如果您不知道缩写词的含义,请查找)是持续错误的来源。最安全的是避免使用&#39; $&#39;以及:
temp <- temp[ temp[["SERIES"]] %in% c("BE, "DR", EQ") , # row selection rule
c("DATE","SYMBOL","OPEN","HIGH","LOW","CLOSE","LAST","VOLUME") ] # col select
答案 1 :(得分:1)
这将完成工作:
library(data.table)
output <- setDT(df)[SERIES %in% c("EQ", "BE", "DR") ]