Question

我有.txt个文件读入名为table的{{1}}，其中包含超过200万个9个变量的观察值。我正在尝试将power分为两行，其中包含＆＃34; 01/02 / 2007＆＃34;或＆＃34; 02/02 / 2007＆＃34;。在创建子集之后，RStudio环境说我最终得到零观察，但是相同的变量。

如何只获得包含＆＃34; 01/02 / 2007＆＃34;的行的数据子集？和＆＃34; 02/02 / 2007＆＃34;？

我看到了类似的帖子，但我的数据集仍然出错。请参见链接：Select multiple rows conditioning on ID in R

我的数据：

power

我的代码：

#load data
> power <- read.table("textfile.txt", stringsAsFactors = FALSE, head = TRUE)
#subsetted first column called Date
> head(power$Date)
#[1] 16/12/2006 16/12/2006 16/12/2006 16/12/2006 16/12/2006 16/12/2006

> str(power$Date)
 chr [1:2075259] "16/12/2006" "16/12/2006" "16/12/2006" "16/12/2006" ...

子集数据：

> subpower <- subset(power, Date %in% c("01/02/2007", "02/02/2007"))

Answer 1

我猜你的数据集可能有trailing/leading个空格，因为

subset(power, Date %in% c("01/02/2007", "02/02/2007"))
#       Date Val
#1 01/02/2007  14
#8 02/02/2007  28

如果我将行更改为

power$Date[1] <- '01/02/2007 '
power$Date[8] <- ' 02/02/2007'

subset(power, Date %in% c("01/02/2007", "02/02/2007"))
#[1] Date Val 
<0 rows> (or 0-length row.names)

您可以使用str_trim

中的stringr

library(stringr)
subset(power, str_trim(Date) %in% c('01/02/2007', '02/02/2007'))
#         Date Val
#1 01/02/2007   14
#8  02/02/2007  28

或使用gsub

subset(power, gsub("^ +| +$", "", Date) %in% c('01/02/2007', '02/02/2007'))
#         Date Val
#1 01/02/2007   14
#8  02/02/2007  28

或不删除空格的其他选项是使用grep

subset(power, grepl('01/02/2007|02/02/2007', Date))
#         Date Val
#1 01/02/2007   14
#8  02/02/2007  28

数据

power <- structure(list(Date = c("01/02/2007", "16/12/2006", "16/12/2006", 
"16/12/2006", "16/12/2006", "16/12/2006", "16/12/2006", "02/02/2007"
), Val = c(14L, 24L, 23L, 22L, 23L, 25L, 23L, 28L)), .Names = c("Date", 
"Val"), class = "data.frame", row.names = c(NA, -8L))

Answer 2

尝试：

> subpower = power[power$Date %in% c("01/02/2007", "02/02/2007") ,]
> subpower
        Date Val
1 01/02/2007  14
8 02/02/2007  28

（使用来自@ akrun的答案的电力数据）

此外，如果您使用适当的子集名称，您自己的代码将起作用：＆＃34; subpower＆＃34;而不是＆＃34; powersub＆＃34;！

> subpower <- subset(power, Date %in% c("01/02/2007", "02/02/2007"))
> subpower
        Date Val
1 01/02/2007  14
8 02/02/2007  28
>
> str(subpower)
'data.frame':   2 obs. of  2 variables:
 $ Date: chr  "01/02/2007" "02/02/2007"
 $ Val : int  14 28

Answer 3

您的方法是正确的，请尝试使用

在文本文件中阅读

power <- read.table("textfile.txt", stringsAsFactors = FALSE)

用条件子集多行

3 个答案:

数据