Question

我有一个.csv文件中的以下数据框，该文件由更多行组成，但为了保持简单，我将其缩小到这三个。您还可以在此处访问csv文件：https://dl.dropboxusercontent.com/u/16277659/filter.csv

NAME;       YEAR;   VALUE
SAMPLE1;    1969;   6
SAMPLE1;    1970;   -6
SAMPLE1;    1971;   -7
SAMPLE1;    1972;   =-X
SAMPLE1;    1972;   ST
SAMPLE1;    1972;   3
SAMPLE1;    1975;   -7
SAMPLE1;    1976;   3
SAMPLE1;    1977;   3
SAMPLE1;    1978;   0
SAMPLE2;    1991;   -15
SAMPLE2;    1992;   =X
SAMPLE2;    1992;   -58
SAMPLE2;    1994;   -40

我想做的是以下内容：我有时会有定性值（如= -X，ST等），我不一定要放松但是如果同一年有一个数值（在SAMPLE1 1972 = -X和ST），我想只保留数值并去除其他值。

你会怎么做？谢谢你的帮助。

Answer 1

@akrun提供的方法将过滤我们在VALUE

中存在非数字的任何记录

以下内容只是用NA替换所有这些值（您的帖子建议您不要丢失这些记录 - 只需删除文本值）。

dat <- read.csv("https://dl.dropboxusercontent.com/u/16277659/filter.csv", sep = ";")
dat$VALUE <- as.numeric(as.character(dat$VALUE))
dat

> dat
      NAME YEAR VALUE
1  SAMPLE1 1969     6
2  SAMPLE1 1970    -6
3  SAMPLE1 1971    -7
4  SAMPLE1 1972    NA
5  SAMPLE1 1972    NA
6  SAMPLE1 1972     3
7  SAMPLE1 1975    -7
8  SAMPLE1 1976     3
9  SAMPLE1 1977     3
10 SAMPLE1 1978     0
11 SAMPLE2 1991   -15
12 SAMPLE2 1992    NA
13 SAMPLE2 1992   -58
14 SAMPLE2 1994   -40

Answer 2

从您的帖子中看，如果特定年份只有non-numeric VALUES，请保留，例如，如果我将最后一个条目更改为＆＃34; ST＆＃34;

dat$VALUE[14] <- "ST"
indx <- with(dat, as.logical(ave(VALUE, YEAR, FUN=function(x) 
          {x1 <- grepl("\\d+", x); x2 <- grepl("[[:alpha:]]", x); if(all(x2)) x2  else x1})))
 dat[indx,]
 #     NAME YEAR  VALUE
#1  SAMPLE1 1969      6
#2  SAMPLE1 1970     -6
#3  SAMPLE1 1971     -7
#6  SAMPLE1 1972      3
#7  SAMPLE1 1975     -7
#8  SAMPLE1 1976      3
#9  SAMPLE1 1977      3
#10 SAMPLE1 1978      0
#11 SAMPLE2 1991    -15
#13 SAMPLE2 1992    -58
#14 SAMPLE2 1994     ST

Answer 3

我还没有掌握正则表达式，所以我的思绪首先来到这里：

dat <- read.csv2("filter.csv", as.is=TRUE)
dat$IsNum <- !(is.na(as.numeric(dat$VALUE)))

> dat
      NAME YEAR VALUE IsNum
1  SAMPLE1 1969     6  TRUE
2  SAMPLE1 1970    -6  TRUE
3  SAMPLE1 1971    -7  TRUE
4  SAMPLE1 1972   =-X FALSE
5  SAMPLE1 1972    ST FALSE
6  SAMPLE1 1972     3  TRUE
7  SAMPLE1 1975    -7  TRUE
8  SAMPLE1 1976     3  TRUE
9  SAMPLE1 1977     3  TRUE
10 SAMPLE1 1978     0  TRUE
11 SAMPLE2 1991   -15  TRUE
12 SAMPLE2 1992    =X FALSE
13 SAMPLE2 1992   -58  TRUE
14 SAMPLE2 1994   -40  TRUE

从那里检查是否IsNum == TRUE

是一件简单的事情

Answer 4

我的回答是：

data <- read.csv("filter.csv", sep=";")
data[!grepl("[A-Za-z]", data$VALUE),]

输出

      NAME YEAR VALUE
1  SAMPLE1 1969     6
2  SAMPLE1 1970    -6
3  SAMPLE1 1971    -7
6  SAMPLE1 1972     3
7  SAMPLE1 1975    -7
8  SAMPLE1 1976     3
9  SAMPLE1 1977     3
10 SAMPLE1 1978     0
11 SAMPLE2 1991   -15
13 SAMPLE2 1992   -58
14 SAMPLE2 1994   -40

Answer 5

这是一个data.table解决方案

dat <- read.csv("https://dl.dropboxusercontent.com/u/16277659/filter.csv", sep = ";")

library(data.table)
setDT(dat)[, .SD[, ifelse(any(grepl("\\d", VALUE)), 
                          grep("\\d", VALUE, value = T), 
                          VALUE)], 
                          by = YEAR]
##    YEAR  V1
## 1: 1969   6
## 2: 1970  -6
## 3: 1971  -7
## 4: 1972   3
## 5: 1975  -7
## 6: 1976   3
## 7: 1977   3
## 8: 1978   0
## 9: 1991 -15
##10: 1992 -58
##11: 1994 -40

R：过滤掉数据帧中的非数值

5 个答案: