R - .csv文件 - 提取变量

时间:2015-06-03 15:50:36

标签: r csv

我输入了一个包含“付费”和“说明”

等列的大型.csv文件

我试图找出当“描述”是支气管炎或列中的其他疾病时如何只拉“付费”栏。

这就像在Excel中执行数据透视表并仅对特定描述进行过滤并接收所有单独付费行。

 Paid Description  val 
 $500 Bronchitis   1.5
 $3,250 'Complication of Pregnancy/Childbirth' 2.2
 $5,400 Burns 3.3
 $20.50 Bronchitis 4.4
 $24  Ashtma 1.2

2 个答案:

答案 0 :(得分:1)

如果您的数据是

paid <- c(300,200,150)
desc <- c("bronchitis","headache","broken.leg")
df <- data.frame(paid, desc)

尝试

df[desc=="bronchitis",c("paid")]

# the argument ahead of the comma filters the row,
# the argument after the comma refers to the column

# > df[desc=="bronchitis",c("paid")]
# [1] 300

library(dplyr)
df %>% filter(desc=="bronchitis") %>% select(paid)

# filter refers to the row condition
# select filters the output column(s)


# > df %>% filter(desc=="bronchitis") %>% select(paid)
#   paid
# 1  300

答案 1 :(得分:1)

使用data.table

library(data.table)#v1.9.5+
setkey(setDT(df1), Description)[.('Bronchitis'),'Paid', with=FALSE]
#    Paid
#1:   $500
#2: $20.50

数据

df1 <- structure(list(ex = c("Description", "Bronchitis",
"Complication of Pregnancy/Childbirth", 
"Burns", "Bronchitis", "Ashtma"), data = c("val", "1.5", "2.2", 
"3.3", "4.4", "1.2")), .Names = c("ex", "data"), class = "data.frame",
row.names = c("Paid", "$500", "$3,250", "$5,400", "$20.50", "$24"))