我有一个大数据框(my_DF),其中有4个重要列:ID(1-> 100),YEAR(2000、2001、2002、2003、2004、2005),MONTH(1月-> 12月),LENGHT (从0.1到1.0的连续值)。 像这样的东西:
YEAR MONTH ID LENGHT
1 2000 january S1 0.2
2 2000 january S1 0.3
3 2000 january S1 0.1
4 2000 january S2 0.5
5 2000 january S2 0.3
6 2000 february S1 0.9
7 2000 february S1 0.4
8 2000 february S1 0.6
9 2000 february S3 0.4
10 2000 february S3 0.3
11 2000 march S1 0.7
...
我需要在数据框中添加一个新列,其中填充了每种唯一情况的长度中位数,因此是每个ID,YEAR和MONTH值。
由于聚合,我成功获取了所需的值:
agg <- aggregate(my_DF["LENGHT"], by = list(my_DF$YEAR, my_DF$MONTH, my_DF$ID), median)
通过这种方式,我获得了想要的值,但是,当然,这只是创建了一个新的数据框。 我找不到基于YEAR,MONTH和ID对应关系将数据框“ agg”的值粘贴到数据框“ my_DF”的新列中的快速方法。
例如,我想要获得类似的东西:
YEAR MONTH ID LENGHT MONTHLY_LENGHT_MEDIAN
1 2000 january S1 0.2 0.2
2 2000 january S1 0.3 0.2
3 2000 january S1 0.1 0.2
4 2000 january S2 0.5 0.4
5 2000 january S2 0.3 0.4
6 2000 february S1 0.9 0.6
7 2000 february S1 0.4 0.6
8 2000 february S1 0.6 0.6
9 2000 february S3 0.4 0.35
10 2000 february S3 0.3 0.35
11 2000 march S1 0.7 0.7
所以,我想知道条件命令是否适合我的情况(如果,ifelse ...)。
不幸的是,我对这些命令不满意。该怎么办?
感谢您的帮助!
答案 0 :(得分:2)
直接使用aggregate
中的merge
来创建列,而不是先用ave
然后再用base R
进行总结
my_df$MONTHLY_LENGHT_MEDIAN <- with(my_df, ave(LENGHT, YEAR,
MONTH, ID, FUN = median))
mydf$MONTHLY_LENGHT_MEDIAN
#[1] 0.20 0.20 0.20 0.40 0.40 0.60 0.60 0.60 0.35 0.35
或与tidyverse
library(tidyverse)
my_df %>%
group_by(LENGHT, YEAR, MONTH) %>%
mutate(MONTHLY_LENGHT_MEDIAN = median(LENGHT))
my_df <- structure(list(YEAR = c(2000L, 2000L, 2000L, 2000L, 2000L, 2000L,
2000L, 2000L, 2000L, 2000L), MONTH = c("january", "january",
"january", "january", "january", "february", "february", "february",
"february", "february"), ID = c("S1", "S1", "S1", "S2", "S2",
"S1", "S1", "S1", "S3", "S3"), LENGHT = c(0.2, 0.3, 0.1, 0.5,
0.3, 0.9, 0.4, 0.6, 0.4, 0.3)), class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10"))
答案 1 :(得分:1)
您可以使用import { Text } from 'react-native';
Text.defaultProps = Text.defaultProps || {};
data.table