我有如下数据表。我想获得如下输出。 (创建一个名为“百分比”的行,并简单地计算每年的总和中的“ S”百分比)。请参见下面的输出表。
如何实现这是R数据表方法?
感谢您的帮助。
Table:
Category 1998 1999 2000 2001 2002 ..... 2018
No_History 10 15 2 22 15 ..... 16
NS 17 20 15 23 10 ..... 21
S 15 14 85 25 47 ...... 15
Output:
Category 1998 1999 2000 2001 2002 ..... 2018
No_History 10 15 2 22 15 ..... 16
NS 17 20 15 23 10 ..... 21
S 15 14 85 25 47 ..... 15
Percentage 35.7 28.5 83.3 35.7 65.2 ..... 28.8
Simply calculate percentage = S/(No_History+NS+S)*100
答案 0 :(得分:0)
也许是这样的。首先,我创建一个数据框。
# Create data frame
df <- read.table(text ="Category 1998 1999 2000 2001 2002 2018
No_History 10 15 2 22 15 16
NS 17 20 15 23 10 21
S 15 14 85 25 47 15", header = FALSE)
然后,我必须将其重组为有用的格式。使用tidy format使生活更轻松。
# Restructure data:
# Transpose
# Use first row as column names
# Remove first row
# Convert to data table
# Convert columns to numerics
df <- t(df)
colnames(df) <- df[1, ]
df <- df[-1,]
dt <- data.table(df)
dt[, names(dt) := lapply(.SD, as.numeric)]
最后,我进行计算:
# Do calculation
dt[, Percentage := 100 * S/(No_History + NS + S)]
给予
# Category No_History NS S Percentage
# 1: 1998 10 17 15 35.71429
# 2: 1999 15 20 14 28.57143
# 3: 2000 2 15 85 83.33333
# 4: 2001 22 23 25 35.71429
# 5: 2002 15 10 47 65.27778
# 6: 2018 16 21 15 28.84615
要将数据恢复为您指定的格式,我必须转置数据表。
# Transpose back to desired format
t(dt)
# [,1] [,2] [,3] [,4] [,5] [,6]
# Category 1998.00000 1999.00000 2000.00000 2001.00000 2002.00000 2018.00000
# No_History 10.00000 15.00000 2.00000 22.00000 15.00000 16.00000
# NS 17.00000 20.00000 15.00000 23.00000 10.00000 21.00000
# S 15.00000 14.00000 85.00000 25.00000 47.00000 15.00000
# Percentage 35.71429 28.57143 83.33333 35.71429 65.27778 28.84615
如果不是必须要使用data.table
,也可以使用dplyr
。
# Create data frame
df <- read.table(text ="Category 1998 1999 2000 2001 2002 2018
No_History 10 15 2 22 15 16
NS 17 20 15 23 10 21
S 15 14 85 25 47 15", header = FALSE)
# Restructure data:
# Transpose
# Use first row as column names
# Remove first row
df <- t(df)
colnames(df) <- df[1, ]
df <- df[-1,]
# Convert to data frame
# Convert all to numeric
# Perform calculation
# Transpose result
df %>%
data.frame %>%
mutate_all(function(x)as.numeric(as.character(x))) %>%
mutate(Percentage = 100 * S /(No_History + NS + S)) %>%
t
# [,1] [,2] [,3] [,4] [,5] [,6]
# Category 1998.00000 1999.00000 2000.00000 2001.00000 2002.00000 2018.00000
# No_History 10.00000 15.00000 2.00000 22.00000 15.00000 16.00000
# NS 17.00000 20.00000 15.00000 23.00000 10.00000 21.00000
# S 15.00000 14.00000 85.00000 25.00000 47.00000 15.00000
# Percentage 35.71429 28.57143 83.33333 35.71429 65.27778 28.84615