我对R比较新,我在根据多列中的相似性合并行时遇到问题。 我有以下数据集
LAST_NAME FIRST_NAME INTERVAL VISIT_DATE MFQ_1 MFQ_2 MFQ_3 Handedness ARI_1 ARI_2 ARI_4 ARI_COMPLETED_BY
Doe Jane Interval 1 1/1/99 4 6 2 Na Na Na Na Na
Doe Jane Interval 1 1/1/99 Na Na Na Right-Handed Na Na Na Na
Doe Jane Interval 1 1/1/99 Na Na Na Na 4 2 2 Dad
Doe Jane Interval 2 2/4/04 Na Na Na Right-Handed Na Na Na Na
Doe Jane Interval 2 2/4/04 5 6 3 Na Na Na Na Na
Doe Jane Interval 2 2/4/04 Na Na Na Na 4 5 5 Mom
Smith Joe Interval 1 3/1/01 5 1 7 Na Na Na Na Na
Smith Joe Interval 1 3/1/01 Na Na Na Left-Handed Na Na Na Na
Smith Joe Interval 1 3/1/01 Na Na Na Na 8 8 2 Dad
Smith Joe Interval 2 5/4/09 Na Na Na Na 8 5 4 Dad
Smith Joe Interval 2 5/4/09 7 2 8 Na Na Na Na Na
Smith Joe Interval 2 5/4/09 Na Na Na Left-Handed Na Na Na Na
我想基于Name / Interval / Date合并行,使它看起来像这样:
LAST_NAME FIRST_NAME INTERVAL VISIT_DATE MFQ_1 MFQ_2 MFQ_3 Handedness ARI_1 ARI_2 ARI_4 ARI_COMPLETED_BY
Doe Jane Interval 1 1/1/99 4 6 2 Right-Handed 4 2 2 Dad
Doe Jane Interval 2 2/4/04 5 6 3 Right-Handed 4 5 5 Mom
Smith Joe Interval 1 3/1/01 5 1 7 Left-Handed 8 8 2 Dad
Smith Joe Interval 2 5/4/09 7 2 8 Left-Handed 8 5 4 Dad
我尝试过以下代码:
CTDB %>% group_by(LAST_NAME:VISIT_DATE) %>% summarise_all(funs(na.omit(.)))
但我收到以下错误
Error in mutate_impl(.data, dots) : Evaluation error: NA/NaN argument.
In addition: Warning messages:
1: In LAST_NAME:VISIT_DATE :
numerical expression has 3326 elements: only the first used
2: In LAST_NAME:VISIT_DATE :
numerical expression has 3326 elements: only the first used
3: In evalq(LAST_NAME:VISIT_DATE, <environment>) :
NAs introduced by coercion
4: In evalq(LAST_NAME:VISIT_DATE, <environment>) :
NAs introduced by coercion
我不知道如何解决这个问题才能得到理想的结果。任何帮助将不胜感激!
答案 0 :(得分:1)
您可以将vars(...)
与na.omit
一起使用。 (请注意,na.exclude
没有按照您的想法执行。NA
更接近您想要的内容。如果您的值实际为i[!is.na(i)]
,那么您可以改为使用library(tidyverse)
df %>%
group_by_at(vars(LAST_NAME:VISIT_DATE)) %>%
summarise_all(function(i) { i[i!="Na"] })
df <- read.table(text="LAST_NAME FIRST_NAME INTERVAL VISIT_DATE MFQ_1 MFQ_2 MFQ_3 Handedness ARI_1 ARI_2 ARI_4 ARI_COMPLETED_BY
Doe Jane Interval_1 1/1/99 4 6 2 Na Na Na Na Na
Doe Jane Interval_1 1/1/99 Na Na Na Right-Handed Na Na Na Na
Doe Jane Interval_1 1/1/99 Na Na Na Na 4 2 2 Dad
Doe Jane Interval_2 2/4/04 Na Na Na Right-Handed Na Na Na Na
Doe Jane Interval_2 2/4/04 5 6 3 Na Na Na Na Na
Doe Jane Interval_2 2/4/04 Na Na Na Na 4 5 5 Mom
Smith Joe Interval_1 3/1/01 5 1 7 Na Na Na Na Na
Smith Joe Interval_1 3/1/01 Na Na Na Left-Handed Na Na Na Na
Smith Joe Interval_1 3/1/01 Na Na Na Na 8 8 2 Dad
Smith Joe Interval_2 5/4/09 Na Na Na Na 8 5 4 Dad
Smith Joe Interval_2 5/4/09 7 2 8 Na Na Na Na Na
Smith Joe Interval_2 5/4/09 Na Na Na Left-Handed Na Na Na Na", header=TRUE, stringsAsFactors=FALSE)
。
div
答案 1 :(得分:0)
首先,您需要使用显式NA
值替换“Na”字符串
CTDB[CTDB == "Na"] <- NA
您也无法在分组功能中使用:
,因此我们将列出要分组的列。然后将na.omit()
与first()
一起包裹,因为na.omit
单独不是聚合函数,并且它不会告诉dplyr
如何汇总。
CTDB %>% group_by(LAST_NAME, FIRST_NAME, INTERVAL, VISIT_DATE) %>%
summarize_all(funs(first(na.omit(.))))
答案 2 :(得分:0)
使用基数R:
df[df=="Na]=NA
aggregate(df,df[1:4],na.omit)[-(5:8)]
LAST_NAME FIRST_NAME INTERVAL VISIT_DATE MFQ_1 MFQ_2 MFQ_3 Handedness ARI_1 ARI_2 ARI_4 ARI_COMPLETED_BY
1 Doe Jane Interval_1 1/1/99 4 6 2 Right-Handed 4 2 2 Dad
2 Doe Jane Interval_2 2/4/04 5 6 3 Right-Handed 4 5 5 Mom
3 Smith Joe Interval_1 3/1/01 5 1 7 Left-Handed 8 8 2 Dad
4 Smith Joe Interval_2 5/4/09 7 2 8 Left-Handed 8 5 4 Dad