我在R
中有一个数据框id name class x101 x202 x303
76978 phil 2 0.407034783 0.001 0.192229687
59911 jose 2 0.327173661 0.004 0.227843273
46537 matt 3 0.590337464 0.005 0.057271545
77345 benn 4 0.293847569 0.002 0.170405643
53180 crai 2 0.844581456 0.003 0.253665748
21063 lour 4 0.080756674 0.002 0.902143356
35456 moni 4 0.445965164 0.004 0.531952568
我需要删除以" x"开头的列。 (x101,x202和x303),平均值小于0.1。这将删除列X202。最终我的输出应该如下:
id name class x101 x303
76978 phil 2 0.407034783 0.192229687
59911 jose 2 0.327173661 0.227843273
46537 matt 3 0.590337464 0.057271545
77345 benn 4 0.293847569 0.170405643
53180 crai 2 0.844581456 0.253665748
21063 lour 4 0.080756674 0.902143356
35456 moni 4 0.445965164 0.531952568
我怎样才能在R?
中完成答案 0 :(得分:1)
我建议使用dplyr
包来实现此目的。
tmp<-read.table(text="id name class x101 x202 x303
76978 phil 2 0.407034783 0.001 0.192229687
59911 jose 2 0.327173661 0.004 0.227843273
46537 matt 3 0.590337464 0.005 0.057271545
77345 benn 4 0.293847569 0.002 0.170405643
53180 crai 2 0.844581456 0.003 0.253665748
21063 lour 4 0.080756674 0.002 0.902143356
35456 moni 4 0.445965164 0.004 0.531952568",header=TRUE)
select_if(tmp,function(x) ((is.numeric(x) & mean(x) > 0.1)|!is.numeric(x)))
这将为您提供所需的输出。
id name class x101 x303
1 76978 phil 2 0.40703478 0.19222969
2 59911 jose 2 0.32717366 0.22784327
3 46537 matt 3 0.59033746 0.05727155
4 77345 benn 4 0.29384757 0.17040564
5 53180 crai 2 0.84458146 0.25366575
6 21063 lour 4 0.08075667 0.90214336
7 35456 moni 4 0.44596516 0.53195257
答案 1 :(得分:0)
这需要根据您拥有的数据框进行调整(例如,如果您有更多列),但按照您的示例工作:
cbind(df[,1:3], df[,4:6][colMeans(df[,4:6]) > 0.1])
但是,使用grepl
,您可以自动选择以&#34; x&#34;开头的列。 基础R单行:
cbind(df[,!grepl("x", colnames(df))], df[,grepl("x", colnames(df))][colMeans(df[,grepl("x", colnames(df))]) > 0.1])
此脚本的解剖结构:这告诉它加入不以&#34; x&#34;开头的列(cbind
) (df[,!grepl("x", colnames(df))]
)和以&#34; x&#34;开头的那些列平均值超过0.1(df[,grepl("x", colnames(df))][colMeans(df[,grepl("x", colnames(df))]) > 0.1]
)。
答案 2 :(得分:0)
在基地R中,您可以执行以下操作。
inx <- which(sapply(dat, inherits, "numeric"))
inx <- names(dat[inx])[grepl("x", names(dat[inx])) & colMeans(dat[inx]) < 0.1]
result <- dat[-which(names(dat) %in% inx)]
result
# id name class x101 x303
#1 76978 phil 2 0.40703478 0.19222969
#2 59911 jose 2 0.32717366 0.22784327
#3 46537 matt 3 0.59033746 0.05727155
#4 77345 benn 4 0.29384757 0.17040564
#5 53180 crai 2 0.84458146 0.25366575
#6 21063 lour 4 0.08075667 0.90214336
#7 35456 moni 4 0.44596516 0.53195257
DATA。
dat <- read.table(text = "
id name class x101 x202 x303
76978 phil 2 0.407034783 0.001 0.192229687
59911 jose 2 0.327173661 0.004 0.227843273
46537 matt 3 0.590337464 0.005 0.057271545
77345 benn 4 0.293847569 0.002 0.170405643
53180 crai 2 0.844581456 0.003 0.253665748
21063 lour 4 0.080756674 0.002 0.902143356
35456 moni 4 0.445965164 0.004 0.531952568
", header = TRUE)
答案 3 :(得分:0)
你也可以这样做:
$url = 'http://localhost:8080/alfresco/api/-default-/public/alfresco/versions/1/nodes/-shared-/children?alf_ticket=TICKET_66....';
$ch = curl_init ();
curl_setopt_array ( $ch, array (
CURLOPT_USERPWD => 'admin:admin',
CURLOPT_POST => 1,
CURLOPT_POSTFIELDS => array (
'filedata' => new CURLFile ( 'tmp_uploads/test.doc' ),
'name' => 'myfile.doc',
'relativePath' => 'uploads'
),
CURLOPT_URL => $url,
CURLOPT_RETURNTRANSFER => true
) );
// execute post
$result = curl_exec ( $ch );
// close curl handle
curl_close ( $ch );
屈服
keep <- !colnames(df) %in% names(which(sapply(df[startsWith(colnames(df), 'x')], mean) < 0.1))
(df <- df[keep])
<小时/> 这是一个多步骤的方法:
id name class x101 x303
1 76978 phil 2 0.40703478 0.19222969
2 59911 jose 2 0.32717366 0.22784327
3 46537 matt 3 0.59033746 0.05727155
4 77345 benn 4 0.29384757 0.17040564
5 53180 crai 2 0.84458146 0.25366575
6 21063 lour 4 0.08075667 0.90214336
7 35456 moni 4 0.44596516 0.53195257
x
开头的列
startsWith(colnames(df), 'x')
:sapply
sapply(df[startsWith(colnames(df), 'x')], mean)
检查平均值并获取名称:which
names(which(sapply(df[startsWith(colnames(df), 'x')], mean) < 0.1))