我在根据前一个提取和创建新的data.frame时遇到了问题。
所以我们有:
> head(data.raw)
date id contacted contacted_again region
1 2015-11-29 234 CHAT EMAIL APAC
2 2015-11-29 234 EMAIL EMAIL APAC
3 2015-11-27 257 PHONE PHONE EMEA
4 2015-11-27 278 PHONE EMAIL APAC
5 2015-11-27 293 CHAT EMAIL EMEA
6 2015-11-27 243 EMAIL EMAIL EMEA
market
1 AU/NZ
2 SE Asia (English)
3 Spain
4 China Mainland
5 DACH
6 DACH
然而,我写的是
data.ru <- data.raw[data.raw$market=="Russia",]
我收到以下混乱:
date id contacted contacted_again region market
67 2015-11-25 334 CHAT EMAIL EMEA Russia
NA <NA> <NA> <NA> <NA> <NA> <NA>
NA.1 <NA> <NA> <NA> <NA> <NA> <NA>
NA.2 <NA> <NA> <NA> <NA> <NA> <NA>
NA.3 <NA> <NA> <NA> <NA> <NA> <NA>
NA.4 <NA> <NA> <NA> <NA> <NA> <NA>
如何编写一个命令来接收一个普通的data.frame,所有行都是$ market ==“Russia”而没有任何NAs?
答案 0 :(得分:0)
我只想使用子集函数。
test <- data.frame(x = c("USA", "USA", "USA", "Russia", "Russia", NA), y = c("Orlando", "Boston", "Memphis", NA, "St. Petersburg", "Mexico City"))
print(test)
x y
1 USA Orlando
2 USA Boston
3 USA Memphis
4 Russia <NA>
5 Russia St. Petersburg
6 <NA> Mexico City
subset(test, x == "Russia")
x y
4 Russia <NA>
5 Russia St. Petersburg
答案 1 :(得分:0)
你可能想尝试:data.ru&lt; - data.raw [data.raw $ market%in%&#34; Russia&#34;,]
说明:我假设您的数据集中有空行,它们被读作NAs(缺失值)。由于R不知道给定的NA是否等于&#34;俄罗斯&#34;生成的数据框是否包括它们。
代码中的插图:
# create sample dataset
example.df <- data.frame(market=c(NA, "Russia", NA), outcome = c(1,2,3))
# match market using ==
example.df$market == "Russia"
example.df[example.df$market == "Russia",]
# match market using %in%
example.df$market %in% "Russia"
example.df[example.df$market %in% "Russia",]