基于旧数据表上的ifelse创建新数据表

时间:2016-09-26 02:31:28

标签: r if-statement data.table subset

我正在尝试使用ifelse语句对我的数据表进行子集化,但我没有得到我正在寻找的结果。

我的初始数据表如下所示:

head(Data_copy, n = 18)

    Company       Date       DOW variable value Year Month End_of_Month
 1:   ASXRI 1991-09-06    Friday       RI    NA 1991   Sep            0
 2:   ASXRI 1991-09-09    Monday       RI    NA 1991   Sep            0
 3:   ASXRI 1991-09-10   Tuesday       RI    NA 1991   Sep            0
 4:   ASXRI 1991-09-11 Wednesday       RI    NA 1991   Sep            0
 5:   ASXRI 1991-09-12  Thursday       RI    NA 1991   Sep            0
 6:   ASXRI 1991-09-13    Friday       RI    NA 1991   Sep            0
 7:   ASXRI 1991-09-16    Monday       RI    NA 1991   Sep            0
 8:   ASXRI 1991-09-17   Tuesday       RI    NA 1991   Sep            0
 9:   ASXRI 1991-09-18 Wednesday       RI    NA 1991   Sep            0
10:   ASXRI 1991-09-19  Thursday       RI    NA 1991   Sep            0
11:   ASXRI 1991-09-20    Friday       RI    NA 1991   Sep            0
12:   ASXRI 1991-09-23    Monday       RI    NA 1991   Sep            0
13:   ASXRI 1991-09-24   Tuesday       RI    NA 1991   Sep            0
14:   ASXRI 1991-09-25 Wednesday       RI    NA 1991   Sep            0
15:   ASXRI 1991-09-26  Thursday       RI    NA 1991   Sep            0
16:   ASXRI 1991-09-27    Friday       RI    NA 1991   Sep            0
17:   ASXRI 1991-09-30    Monday       RI    NA 1991   Sep            1
18:   ASXRI 1991-10-01   Tuesday       RI    NA 1991   Oct            0

这是250,000中的18行。

我想要的是基于ifelse函数拆分此数据表,如下所示:

Data1 <- ifelse("Weekly" == "Weekly", Data_copy[End_of_Month ==1,], Data_copy)

*“Weekly”==“Weekly”位将在稍后的函数中使用。

我希望Data1是一个新的数据表,它只包含End_of_Month == 1的行。

当我运行上面的代码时,我发现我得到了公司名称的列表,就是这样。

我会告诉你输出的样子:

Data1[[1]]
    [1] "ASXRI" "ASXRI" "ASXRI" "ASXRI" "ASXRI" "ASXRI" "ASXRI" "ASXRI" "ASXRI" "ASXRI" "ASXRI"

现在如果我向下滚动,我得到:

[1387] "AANRI" "AANRI" "AANRI" "AANRI" "AANRI" "AANRI" "APARI" "APARI" "APARI" "APARI" "APARI"
 [1398] "APARI" "APARI" "APARI" "APARI" "APARI" "APARI" "APARI" "APARI" "APARI" "APARI" "APARI"

这些条目中的每一个都只是公司名称之一。

如果我这样做,我会得到我想要的结果:

Data2 <- Data_copy[End_of_Month == 1, ]

Company       Date      DOW variable value Year Month End_of_Month
1:   ASXRI 1991-09-30   Monday       RI    NA 1991   Sep            1
2:   ASXRI 1991-10-31 Thursday       RI    NA 1991   Oct            1
3:   ASXRI 1991-11-29   Friday       RI    NA 1991   Nov            1
4:   ASXRI 1991-12-31  Tuesday       RI    NA 1991   Dec            1
5:   ASXRI 1992-01-31   Friday       RI    NA 1992   Jan            1
6:   ASXRI 1992-02-28   Friday       RI    NA 1992   Feb            1

基本上我想复制Data2,但是使用ifelse语句。

这是前100行:

dput(head(Data_copy, n = 100))
structure(list(Company = c("ASXRI", "ASXRI", "ASXRI", "ASXRI", 
"ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI", 
"ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI", 
"ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI", 
"ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI", 
"ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI", 
"ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI", 
"ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI", 
"ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI", 
"ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI", 
"ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI", 
"ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI", 
"ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI", 
"ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI", 
"ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI"), Date = structure(c(7918, 
7921, 7922, 7923, 7924, 7925, 7928, 7929, 7930, 7931, 7932, 7935, 
7936, 7937, 7938, 7939, 7942, 7943, 7944, 7945, 7946, 7949, 7950, 
7951, 7952, 7953, 7956, 7957, 7958, 7959, 7960, 7963, 7964, 7965, 
7966, 7967, 7970, 7971, 7972, 7973, 7974, 7977, 7978, 7979, 7980, 
7981, 7984, 7985, 7986, 7987, 7988, 7991, 7992, 7993, 7994, 7995, 
7998, 7999, 8000, 8001, 8002, 8005, 8006, 8007, 8008, 8009, 8012, 
8013, 8014, 8015, 8016, 8019, 8020, 8021, 8022, 8023, 8026, 8027, 
8028, 8029, 8030, 8033, 8034, 8035, 8036, 8037, 8040, 8041, 8042, 
8043, 8044, 8047, 8048, 8049, 8050, 8051, 8054, 8055, 8056, 8057
), class = "Date"), DOW = c("Friday", "Monday", "Tuesday", "Wednesday", 
"Thursday", "Friday", "Monday", "Tuesday", "Wednesday", "Thursday", 
"Friday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", 
"Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Monday", 
"Tuesday", "Wednesday", "Thursday", "Friday", "Monday", "Tuesday", 
"Wednesday", "Thursday", "Friday", "Monday", "Tuesday", "Wednesday", 
"Thursday", "Friday", "Monday", "Tuesday", "Wednesday", "Thursday", 
"Friday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", 
"Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Monday", 
"Tuesday", "Wednesday", "Thursday", "Friday", "Monday", "Tuesday", 
"Wednesday", "Thursday", "Friday", "Monday", "Tuesday", "Wednesday", 
"Thursday", "Friday", "Monday", "Tuesday", "Wednesday", "Thursday", 
"Friday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", 
"Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Monday", 
"Tuesday", "Wednesday", "Thursday", "Friday", "Monday", "Tuesday", 
"Wednesday", "Thursday", "Friday", "Monday", "Tuesday", "Wednesday", 
"Thursday", "Friday", "Monday", "Tuesday", "Wednesday", "Thursday"
), variable = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("RI", 
"VO", "MV", "TD", "ND"), class = "factor"), value = c(NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_), Year = c("1991", "1991", "1991", "1991", "1991", "1991", 
"1991", "1991", "1991", "1991", "1991", "1991", "1991", "1991", 
"1991", "1991", "1991", "1991", "1991", "1991", "1991", "1991", 
"1991", "1991", "1991", "1991", "1991", "1991", "1991", "1991", 
"1991", "1991", "1991", "1991", "1991", "1991", "1991", "1991", 
"1991", "1991", "1991", "1991", "1991", "1991", "1991", "1991", 
"1991", "1991", "1991", "1991", "1991", "1991", "1991", "1991", 
"1991", "1991", "1991", "1991", "1991", "1991", "1991", "1991", 
"1991", "1991", "1991", "1991", "1991", "1991", "1991", "1991", 
"1991", "1991", "1991", "1991", "1991", "1991", "1991", "1991", 
"1991", "1991", "1991", "1991", "1991", "1992", "1992", "1992", 
"1992", "1992", "1992", "1992", "1992", "1992", "1992", "1992", 
"1992", "1992", "1992", "1992", "1992", "1992"), Month = c("Sep", 
"Sep", "Sep", "Sep", "Sep", "Sep", "Sep", "Sep", "Sep", "Sep", 
"Sep", "Sep", "Sep", "Sep", "Sep", "Sep", "Sep", "Oct", "Oct", 
"Oct", "Oct", "Oct", "Oct", "Oct", "Oct", "Oct", "Oct", "Oct", 
"Oct", "Oct", "Oct", "Oct", "Oct", "Oct", "Oct", "Oct", "Oct", 
"Oct", "Oct", "Oct", "Nov", "Nov", "Nov", "Nov", "Nov", "Nov", 
"Nov", "Nov", "Nov", "Nov", "Nov", "Nov", "Nov", "Nov", "Nov", 
"Nov", "Nov", "Nov", "Nov", "Nov", "Nov", "Dec", "Dec", "Dec", 
"Dec", "Dec", "Dec", "Dec", "Dec", "Dec", "Dec", "Dec", "Dec", 
"Dec", "Dec", "Dec", "Dec", "Dec", "Dec", "Dec", "Dec", "Dec", 
"Dec", "Jan", "Jan", "Jan", "Jan", "Jan", "Jan", "Jan", "Jan", 
"Jan", "Jan", "Jan", "Jan", "Jan", "Jan", "Jan", "Jan", "Jan"
), End_of_Month = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0)), .Names = c("Company", "Date", "DOW", "variable", "value", 
"Year", "Month", "End_of_Month"), class = c("data.table", "data.frame"
), row.names = c(NA, -100L), .internal.selfref = <pointer: 0x00000000001f0788>)

1 个答案:

答案 0 :(得分:2)

其他用户注意到ifelse不适合您的目的。解释原因可能很有用。从?ifelse开始,ifelse(test, yes, no)会返回

  

相同长度和属性的矢量(包括尺寸        和'&#34; class&#34;')作为'测试'和来自'是'值的数据值        或'不'

换句话说,如果您的test向量的长度为1,则ifelse(...)将返回长度为1的向量。例如,

> ifelse(TRUE, 1:3, 7:9)
[1] 1
> ifelse(c(TRUE, FALSE), 1:3, 7:9)
[1] 1 8

在你的情况下,

ifelse("Weekly" == "Weekly", Data_copy[End_of_Month ==1,], Data_copy)

将返回长度为1的向量。更准确地说,由于测试返回TRUEifelse将返回yes参数中的第一个元素;因为它是一个数据帧(一种列表),ifelse返回数据帧的第一个元素,即第一列。这就是您获得公司名称列表的原因。如果您真的想使用ifelse构造,请尝试

ifelse("Weekly" == "Weekly", list(Data_copy[End_of_Month ==1,]), list(Data_copy))

虽然正如其他人所说,但最好不要使用if {} else {}