我最近一直在使用R表示某些图表,我在csv中提出了以下数据
ID,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,5a,6a,7a,9a,12a,15a
1,435,269,994,832,637,2931,3275,1690,5228,1951,2312,2336,2029,3796,2698,22814,618,752,888,1810,927
2,805,522,862,972,970,1332,1409,2236,1710,3130,2096,2775,4325,4462,8057,3358,826,1118,1181,1542,1681
3,702,656,755,1393,1881,1433,3700,2163,2849,2143,3958,3529,4171,4152,12918,1528,1051,2377,1988,2173,3904
4,833,791,2398,920,0,3200,1850,5038,2626,3854,6144,5505,6861,6860,5002,5383,53,1398,1473,2422,161
5,1635,1783,4765,1768,2130,5761,2114,10518,2732,5109,8508,7307,5910,6825,6605,4430,2020,1879,1663,6087,2735
.
.
.
如果我的X数据标签应该是ID之外的每一列(即:1,2,3,4,5 ......,15a),并且每个& #34;盒"每列中的值除以1000?
我从文档中找到了一种方法,只有当我的数据也是2维时才在2d箱图中绘制,但在这种情况下我有21列。
我可以通过转换数据集来做到这一点,但是这个csv文件每天都会更新,所以它会让人厌烦。
我在Python中完成了这个(Xlabel = 1,2,3,4 ..,仅限15)并给了我以下结果(标签中的值" 4"虽然错了但):
答案 0 :(得分:1)
你需要融化成长形,所以你有单个x和y变量。使用Hadleyverse包:
library(dplyr)
library(tidyr)
library(ggplot2)
# melt from wide to long
df %>% gather(x, y, -ID) %>%
# scale y as described
mutate(y = y / 1000,
# clean out letters inserted by read.csv
x = substr(x, 2, nchar(x)),
# fix factor level order so x axis will be in correct order
x = factor(x, levels = unique(x[order(extract_numeric(x))]))) %>%
# plot
ggplot(aes(x, y)) +
geom_boxplot()
df <- read.csv(text = 'ID,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,5a,6a,7a,9a,12a,15a
1,435,269,994,832,637,2931,3275,1690,5228,1951,2312,2336,2029,3796,2698,22814,618,752,888,1810,927
2,805,522,862,972,970,1332,1409,2236,1710,3130,2096,2775,4325,4462,8057,3358,826,1118,1181,1542,1681
3,702,656,755,1393,1881,1433,3700,2163,2849,2143,3958,3529,4171,4152,12918,1528,1051,2377,1988,2173,3904
4,833,791,2398,920,0,3200,1850,5038,2626,3854,6144,5505,6861,6860,5002,5383,53,1398,1473,2422,161
5,1635,1783,4765,1768,2130,5761,2114,10518,2732,5109,8508,7307,5910,6825,6605,4430,2020,1879,1663,6087,2735')