在我的数据集中:
# A tibble: 240 x 1,415
matchcode S001 S002 S002EVS S003 S003A S004 S006 S007 S007_01 S008 S009 S009A S010 S010_01 S010_02 S010_03 S010_04 S011 S012 S013 S013B S014 S015 S016 S017 S017A
<fct> <dbl> <dbl> <dbl+l> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl+lbl> <dbl> <fct> <fct> <dbl> <dbl+l> <dbl+l> <dbl+l> <dbl+l> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl+lbl> <dbl+lbl>
1 "JPN 198~ 2 1 -4 392 392 -4 324 324 3920120324 -4 JP JP -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 0.6789805 0.6789805
2 "MEX 198~ 2 1 -4 484 484 -4 933 2130 4840120926 -4 MX MX -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 1.1378840 1.1378840
3 "HUN 198~ 2 1 -4 348 348 -4 1280 4321 3480121280 -4 HU HU -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 1.0635516 1.0635516
4 "AUS 198~ 2 1 -4 36 36 -4 973 5478 360120973 -4 AU AU -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 0.9616138 0.9616138
5 "ARG 198~ 2 1 -4 32 32 -4 874 6607 320120874 -4 AR AR -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 0.9266260 0.9266260
6 "FIN 198~ 2 1 -4 246 246 -4 385 7123 2460120385 -4 FI FI -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 1.0000000 1.0000000
7 "KOR 198~ 2 1 -4 410 410 -4 3 7744 4100120003 -4 KR KR -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 1.0000000 1.0000000
8 "ZAF 198~ 2 1 -4 710 710 -4 5420 10260 7100121549 -4 ZA ZA -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 1.0000000 1.0000000
9 "ARG 199~ 2 2 -4 32 32 -4 856 11163 320240856 -4 AR AR 125 -4 -4 -4 -4 1210 -4 1 -4 -4 -4 -4 1.0000000 1.0000000
10 "BLR 199~ 2 2 -4 112 112 -4 106 11415 1120240106 -4 BY BY -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 1.0000000 1.0000000
用NA替换所有负值,我使用以下代码:
df [ df < 0 ] <- NA
但是,我只想对不是字符的列执行此操作(我想摆脱错误消息,而又不抑制它们)。变量charcol
包含应跳过的列的名称。我尝试过:
df [-charcol] df [-charcol] < 0] <- NA
哪个给了我错误:
Error: cannot allocate vector of size 1.8 Gb
除了仍然给我警告:
In addition: Warning messages:
1: In Ops.factor(left, right) : ‘<’ not meaningful for factors
尽管我可能弄错了语法,但我想知道对于大型数据集,此类问题最有效的解决方案是什么。我已经看过data.table vignette一段时间了,但是我真的无法弄清楚该如何做语法。
有什么建议吗?
str(WVSsample)
Classes ‘data.table’ and 'data.frame': 240 obs. of 1415 variables:
$ matchcode : Factor w/ 240 levels "ALB 1998 ","ALB 2002 ",..: 108 134 88 12 4 73 117 232 5 25 ...
$ S001 :Class 'labelled' atomic [1:240] 2 2 2 2 2 2 2 2 2 2 ...
.. ..- attr(*, "label")= chr "Study"
.. ..- attr(*, "format.stata")= chr "%8.0g"
.. ..- attr(*, "labels")= Named num [1:7] -5 -4 -3 -2 -1 1 2
.. .. ..- attr(*, "names")= chr [1:7] "Missing; Unknown" "Not asked in survey" "Not applicable" "No answer" ...
$ S002 :Class 'labelled' atomic [1:240] 1 1 1 1 1 1 1 1 2 2 ...
.. ..- attr(*, "label")= chr "Wave"
.. ..- attr(*, "format.stata")= chr "%8.0g"
.. ..- attr(*, "labels")= Named num [1:11] -5 -4 -3 -2 -1 1 2 3 4 5 ...
.. .. ..- attr(*, "names")= chr [1:11] "Missing; Unknown" "Not asked in survey" "Not applicable" "No answer" ...
$ S002EVS :Class 'labelled' atomic [1:240] -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 ...
.. ..- attr(*, "label")= chr "EVS-wave"
.. ..- attr(*, "format.stata")= chr "%8.0g"
.. ..- attr(*, "labels")= Named num [1:9] -5 -4 -3 -2 -1 1 2 3 4
.. .. ..- attr(*, "names")= chr [1:9] "Missing; Unknown" "Not asked in survey" "Not applicable" "No answer" ...
$ S003 :Class 'labelled' atomic [1:240] 392 484 348 36 32 246 410 710 32 112 ...
.. ..- attr(*, "label")= chr "Country/region"
.. ..- attr(*, "format.stata")= chr "%8.0g"
.. ..- attr(*, "labels")= Named num [1:199] -5 -4 -3 -2 -1 4 8 12 16 20 ...
.. .. ..- attr(*, "names")= chr [1:199] "Missing; Unknown" "Not asked in survey" "Not applicable" "No answer" ...
$ S003A :Class 'labelled' atomic [1:240] 392 484 348 36 32 246 410 710 32 112 ...
.. ..- attr(*, "label")= chr "Country/regions [with split ups]"
.. ..- attr(*, "format.stata")= chr "%8.0g"
.. ..- attr(*, "labels")= Named num [1:199] -5 -4 -3 -2 -1 4 8 12 16 20 ...
.. .. ..- attr(*, "names")= chr [1:199] "Missing; Unknown" "Not asked in survey" "Not applicable" "No answer" ...
$ S004 :Class 'labelled' atomic [1:240] -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 ...
.. ..- attr(*, "label")= chr "Set"
.. ..- attr(*, "format.stata")= chr "%8.0g"
.. ..- attr(*, "labels")= Named num [1:7] -5 -4 -3 -2 -1 1 2
.. .. ..- attr(*, "names")= chr [1:7] "Missing; Unknown" "Not asked in survey" "Not applicable" "No answer" ...
编辑:@ chinsoon12使用以下代码段提及:
f_dowle3 = function(DT) {
for (j in seq_len(ncol(DT)))
set(DT,which(is.na(DT[[j]])),j,0)
}
但是此代码不能做两件事:
它用零替换NA,而我想用NA替换负值。我需要将which(is.na(DT[[j]]))
部分更改为DT[[j]]) < 0
。
它不考虑字符列。
我将代码更改为:
f_dowle3 = function(DT) {
# or by number (slightly faster than by name) :
for (j in seq_len(ncol(DT)))
set(DT,which(DT[[j]]<0),j,NA)
}
但这会使数据集为NULL。谁能帮助我正确修改代码?
答案 0 :(得分:1)
由于这是一个欺骗,因此将很快删除,因为无法放入注释。
setDT(df)
cols <- names(df)[sapply(df, is.numeric)]
for (x in cols) {
set(df, which(df[[x]] < 0), x, NA_real_)
}