我有一个设置为.csv文件的数据(基本上是:人们在选择哪种酒时会选择与周围音乐播放的来源有关的数据)。将其读取为数据帧会导致df如下所示:
Music Wine
1 French French
2 Italian French
3 None Italian
4 Italian Italian
5 French Other
...
作为表格,它看起来像这样:
Wine
Music Other French Italian
French 35 39 1
None 43 30 11
Italian 35 30 19
现在,我想创建一个频率图,该频率图仅绘制通过Music ==“ None”购买的商品的相对分布。所以基本上我会得到Other = 0.511904,法语= 0.3571429,意大利语= 0.1309524。
现在,我的问题是对该表进行设置是行不通的。
noMusic <- prop.table(table(data[data$Music == "None"]))
geenMuziekTabel <- prop.table(table(data[data$Music == "None"]))
两者都导致:
[1] 0.144032922 0.004115226 0.045267490 0.078189300 NA NA NA NA
[9] NA NA NA NA NA NA NA NA
[17] NA NA NA NA NA NA NA NA
[25] NA NA NA NA NA NA NA NA
[33] NA NA NA NA NA NA NA NA
[41] NA NA NA NA NA NA NA NA
[49] NA NA NA NA NA NA NA NA
[57] NA NA NA NA NA NA NA NA
[65] NA NA NA NA NA NA NA NA
[73] NA NA NA NA NA NA NA NA
[81] NA NA NA NA
我想:也许我应该先对数据框进行子集设置,然后再从中创建一个比例表,但是R似乎还记得还有其他数据,并制作了这个表:
Wine
Music Other French Italian
French 0 0 0
None 43 30 11
Italian 0 0 0
我也尝试了许多不同的方法,但无法弄清楚。有人知道我在做什么错吗?
编辑:根据接受的答案,解决方案如下:
noMusicTable <- prop.table(table(musicwine$Wine[musicwine$Music == "None"]))
#noMusicTable <- prop.table(table(subset(musicwine, Music == "None", select = Wine)))
noMusicDF <- as.data.frame(noMusicTable)
# need to declare x and y explicitly; use stat = 'identity' to map bars to y-variable
ggplot(noMusicDF, mapping = aes(x = Var1, y = Freq)) + geom_bar(stat = 'identity', fill='red')
答案 0 :(得分:1)
以下三种正确地进行子集的方法:
dat <- read.table(text =
"Music Wine
French French
Italian French
None Italian
Italian Italian
French Other", header = TRUE)
# Two different ways to subset
prop.table(table(dat$Wine[dat$Music == "None"]))
prop.table(table(subset(dat, Music == "None", select = Wine)))
# With dplyr and piping
library(dplyr)
dat %>%
filter(Music == "None") %>%
select(Wine) %>%
table() %>%
prop.table()