我很沮丧。我试图根据两列中的值来隔离某些df行。像往常一样,我首先在练习数据中尝试这个。我的代码运行正常。
data1<-df2[df2$fruit=="kiwi" | df2$fruit=="orange" | df2$fruit=="apple" & (df2$dates>= "2010-04-01" & df2$dates< "2010-10-01"), ]
当我在我的真实数据上尝试相同的代码时,它不起作用。它收集我需要的“水果”,但忽略了我的日期范围请求。
data1<-lti_first[lti_first$hai_atc=="C10AA01" | lti_first$hai_atc=="C10AA03" | lti_first$hai_atc=="C10AA04" | lti_first$hai_atc=="C10AA05" | lti_first$hai_atc=="C10AA07" | lti_first$hai_atc=="C10AB02" |lti_first$hai_atc=="C10AA04" |lti_first$hai_atc=="C10AB08" | lti_first$hai_atc=="C10AX09" & (lti_first$date_of_claim >= "2010-04-01" & lti_first$date_of_claim<"2010-10-01"), ]
我的练习数据和实际数据中的变量结构是精确相同。 Fruits / hai_atc是dfs中的两个因子,dfs中的日期是as.Dates。
为了解决这个问题,我尝试对我的数据进行子集化,但这对我来说也不起作用(但对练习数据有效)
x<-subset(lti_first, hai_atc=="V07AY03" | hai_atc=="A11JC94" & (date_of_claim>="2010-04-01" & date_of_claim<"2010-10-01"))
我做错了什么?对我来说,我的代码看起来完全相同!
示例df
names<-c("tom", "mary", "tom", "john", "mary",
"tom", "john", "mary", "john", "mary", "tom", "mary", "john", "john")
dates<-as.Date(c("2010-02-01", "2010-05-01", "2010-03-01",
"2010-07-01", "2010-07-01", "2010-06-01", "2010-09-01",
"2010-07-01", "2010-11-01", "2010-09-01", "2010-08-01",
"2010-11-01", "2010-12-01", "2011-01-01"))
fruit<-as.character(c("apple", "orange", "banana", "kiwi",
"apple", "apple", "apple", "orange", "banana", "apple",
"kiwi", "apple", "orange", "apple"))
age<-as.numeric(c(60,55,60,57,55,60,57,55,57,55,60,55, 57,57))
sex<-as.character(c("m","f","m","m","f","m","m",
"f","m","f","m","f","m", "m"))
df2<-data.frame(names,dates, age, sex, fruit)
df2
dput(df2)
structure(list(names = structure(c(3L, 2L, 3L, 1L, 2L, 3L, 1L,
2L, 1L, 2L, 3L, 2L, 1L, 1L), .Label = c("john", "mary", "tom"
), class = "factor"), dates = structure(c(14641, 14730, 14669,
14791, 14791, 14761, 14853, 14791, 14914, 14853, 14822, 14914,
14944, 14975), class = "Date"), age = c(60, 55, 60, 57, 55, 60,
57, 55, 57, 55, 60, 55, 57, 57), sex = structure(c(2L, 1L, 2L,
2L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 2L), .Label = c("f",
"m"), class = "factor"), fruit = structure(c(1L, 4L, 2L, 3L,
1L, 1L, 1L, 4L, 2L, 1L, 3L, 1L, 4L, 1L), .Label = c("apple",
"banana", "kiwi", "orange"), class = "factor")), .Names = c("names",
"dates", "age", "sex", "fruit"), row.names = c(NA, -14L), class = "data.frame")
**实际数据太大而无法输入,这里是str而不是
str(sample_lti_first)
'data.frame': 20 obs. of 5 variables:
$ hai_dispense_number: Factor w/ 53485 levels "Patient HAI0000017",..: 22260 22260 2527 24311 24311 24311 24311 13674 13674 13674 ...
$ sex : Factor w/ 4 levels "F","M","U","X": 2 2 2 1 1 1 1 1 1 1 ...
$ hai_age : int 18 18 27 40 40 40 40 28 28 28 ...
$ date_of_claim : Date, format: "2009-10-09" "2009-10-09" "2009-10-18" ...
$ hai_atc : Factor w/ 1038 levels "","A01AA01","A01AB03",..: 144 76 859 80 1009 1009 859 81 1008 859 ...
答案 0 :(得分:3)
我认为扩展@Aaron的评论非常重要。您遇到的问题是由于避免使用%in%
的所有OR语句缺少括号,而OR语句在提取函数[
中不起作用。你的玩具示例实际上并没有按照你想要的方式工作 - 有一个orange
水果的日期为2010-12-01
。没有出现其他问题只是机会。
在此代码中读取布尔逻辑的方法
df2[df2$fruit=="kiwi" | df2$fruit=="orange" | df2$fruit=="apple" & (df2$dates>= "2010-04-01" & df2$dates< "2010-10-01"), ]
是:
我想要所有df2行,其中水果是猕猴桃,所有行都是水果 橙色,以及水果是苹果和日期的所有行 2010年3月31日至10/1/2010之间。
这就是你得到的 - 只有苹果被截断到适当的日期范围。实际上玩具数据集中的日期范围之外没有猕猴桃。
现在添加一对括号:
df2[(df2$fruit=="kiwi" | df2$fruit=="orange" | df2$fruit=="apple") & (df2$dates >= "2010-04-01" & df2$dates < "2010-10-01"), ]
此代码说:
我想要df2的所有行,其中水果是猕猴桃,橙子或苹果 日期是2010年3月31日至10/1/2010之间。
尽管如此,%in%
绝对是可行的方式。
答案 1 :(得分:2)
这有用吗?
data1 <- subset(lti_first,
(hai_atc %in% c("C10AA01", "C10AA03", "C10AA04", "C10AA05", "C10AA07",
"C10AB02", "C10AA04", "C10AB08", "C10AX09")) &
(date_of_claim >= as.Date("2010-04-01") & date_of_claim < as.Date("2010-10-01")))
请注意使用%in%
和as.Date
。