我有一个.csv文件中的以下数据框,该文件由更多行组成,但为了保持简单,我将其缩小到这三个。您还可以在此处访问csv文件:https://dl.dropboxusercontent.com/u/16277659/filter.csv
NAME; YEAR; VALUE
SAMPLE1; 1969; 6
SAMPLE1; 1970; -6
SAMPLE1; 1971; -7
SAMPLE1; 1972; =-X
SAMPLE1; 1972; ST
SAMPLE1; 1972; 3
SAMPLE1; 1975; -7
SAMPLE1; 1976; 3
SAMPLE1; 1977; 3
SAMPLE1; 1978; 0
SAMPLE2; 1991; -15
SAMPLE2; 1992; =X
SAMPLE2; 1992; -58
SAMPLE2; 1994; -40
我想做的是以下内容:我有时会有定性值(如= -X,ST等),我不一定要放松但是如果同一年有一个数值(在SAMPLE1 1972 = -X和ST),我想只保留数值并去除其他值。
你会怎么做?谢谢你的帮助。
答案 0 :(得分:3)
@akrun提供的方法将过滤我们在VALUE
以下内容只是用NA
替换所有这些值(您的帖子建议您不要丢失这些记录 - 只需删除文本值)。
dat <- read.csv("https://dl.dropboxusercontent.com/u/16277659/filter.csv", sep = ";")
dat$VALUE <- as.numeric(as.character(dat$VALUE))
dat
> dat
NAME YEAR VALUE
1 SAMPLE1 1969 6
2 SAMPLE1 1970 -6
3 SAMPLE1 1971 -7
4 SAMPLE1 1972 NA
5 SAMPLE1 1972 NA
6 SAMPLE1 1972 3
7 SAMPLE1 1975 -7
8 SAMPLE1 1976 3
9 SAMPLE1 1977 3
10 SAMPLE1 1978 0
11 SAMPLE2 1991 -15
12 SAMPLE2 1992 NA
13 SAMPLE2 1992 -58
14 SAMPLE2 1994 -40
答案 1 :(得分:2)
从您的帖子中看,如果特定年份只有non-numeric VALUES
,请保留,例如,如果我将最后一个条目更改为&#34; ST&#34;
dat$VALUE[14] <- "ST"
indx <- with(dat, as.logical(ave(VALUE, YEAR, FUN=function(x)
{x1 <- grepl("\\d+", x); x2 <- grepl("[[:alpha:]]", x); if(all(x2)) x2 else x1})))
dat[indx,]
# NAME YEAR VALUE
#1 SAMPLE1 1969 6
#2 SAMPLE1 1970 -6
#3 SAMPLE1 1971 -7
#6 SAMPLE1 1972 3
#7 SAMPLE1 1975 -7
#8 SAMPLE1 1976 3
#9 SAMPLE1 1977 3
#10 SAMPLE1 1978 0
#11 SAMPLE2 1991 -15
#13 SAMPLE2 1992 -58
#14 SAMPLE2 1994 ST
答案 2 :(得分:2)
我还没有掌握正则表达式,所以我的思绪首先来到这里:
dat <- read.csv2("filter.csv", as.is=TRUE)
dat$IsNum <- !(is.na(as.numeric(dat$VALUE)))
> dat
NAME YEAR VALUE IsNum
1 SAMPLE1 1969 6 TRUE
2 SAMPLE1 1970 -6 TRUE
3 SAMPLE1 1971 -7 TRUE
4 SAMPLE1 1972 =-X FALSE
5 SAMPLE1 1972 ST FALSE
6 SAMPLE1 1972 3 TRUE
7 SAMPLE1 1975 -7 TRUE
8 SAMPLE1 1976 3 TRUE
9 SAMPLE1 1977 3 TRUE
10 SAMPLE1 1978 0 TRUE
11 SAMPLE2 1991 -15 TRUE
12 SAMPLE2 1992 =X FALSE
13 SAMPLE2 1992 -58 TRUE
14 SAMPLE2 1994 -40 TRUE
从那里检查是否IsNum == TRUE
答案 3 :(得分:1)
我的回答是:
data <- read.csv("filter.csv", sep=";")
data[!grepl("[A-Za-z]", data$VALUE),]
输出
NAME YEAR VALUE
1 SAMPLE1 1969 6
2 SAMPLE1 1970 -6
3 SAMPLE1 1971 -7
6 SAMPLE1 1972 3
7 SAMPLE1 1975 -7
8 SAMPLE1 1976 3
9 SAMPLE1 1977 3
10 SAMPLE1 1978 0
11 SAMPLE2 1991 -15
13 SAMPLE2 1992 -58
14 SAMPLE2 1994 -40
答案 4 :(得分:1)
这是一个data.table
解决方案
dat <- read.csv("https://dl.dropboxusercontent.com/u/16277659/filter.csv", sep = ";")
library(data.table)
setDT(dat)[, .SD[, ifelse(any(grepl("\\d", VALUE)),
grep("\\d", VALUE, value = T),
VALUE)],
by = YEAR]
## YEAR V1
## 1: 1969 6
## 2: 1970 -6
## 3: 1971 -7
## 4: 1972 3
## 5: 1975 -7
## 6: 1976 3
## 7: 1977 3
## 8: 1978 0
## 9: 1991 -15
##10: 1992 -58
##11: 1994 -40