读取R中的复杂数据集

时间:2017-09-04 01:55:31

标签: r

我的数据集看起来如下所示。第一个数字是要素编号,然后是冒号,然后是与该特定要素关联的值。我不知道如何在R中导入这个数据集。任何想法?

236:24 500:163 732:234 869:117 885:106 1249:103 1280:158 1889:119 2015:55 2718:126 3307:137 3578:25 3770:26 4139:128 4723:114 4957: 82 5128:50 5420:124 5603:135 5897:34 5946:117 6069:154 6153:55 6347:87 6372:77 6666:109 6866:223 6984:39 7709:253 7950:87 8078:38 8945:141 9316 :111 9948:103 9989:68 10276:43 10530:76 10532:55 10799:15 10802:20 10848:82 11347:16 11871:51 11883:105 12534:133 12601:13 12781:178 12798:116 12842:106 12916:7 12935:51 12968:154 13028:58 13330:105 13384:2 13568:47 13641:632 13829:18 13964:62 14385:93 14392:272 15280:140 15424:119 15492:52 15523:31 16311: 23 16464:69 16478:94 16584:102 16586:107 16705:272 17138:108 17181:150 17526:280 17540:163 18007:114 18050:53 18180:2 18806:160 18943:73 19055:41 19255:88 19774 :59 19889:72 19921:45 101:68 572:57 732:63 962:120 1304:61 1831:60 1889:58 1973:105 2518:161 2629:228 2990:158 3147:75 3578:11 3860:88 4011:18 4623:141 4684: 411 4758:69 4820:120 6149:102 6234:134 6306:118 6866:147 6927:89 6988:51 7048:178 7193:31 7257:61 7709:229 8061:125 8202:188 8272:17 8759:165 9104 :77 9325:135 9860:97 10055:684 10532:180 10735:64 10744:267 10820:120 10848:186 10923:128 10936:129 11203:160 11303:144 11668:87 11867:97 11871:207 12191:83 12238:193 12380:51 12968:164 13369:58 13929:39 14531:102 14800:130 14931:99 15314:91 15632:62 16165:7 16353:120 16584:137 17216:172 18372:31 18893:75 19133: 93 19154:101 19165:133 19607:20 19784:141 19889:97 19921:60

1 个答案:

答案 0 :(得分:1)

假设您的数据存储在input <- scan('input.txt', what = 'character') data <- as.data.frame(matrix(as.numeric(unlist(strsplit(input, ':'))), ncol = 2)) colnames(data) <- c('Feature', 'Value') str(data) # 'data.frame': 158 obs. of 2 variables: # $ Feature: num 236 24 500 163 732 234 869 117 885 106 ... # $ Value : num 18943 73 19055 41 19255 ...

data <- read.table(text = input, sep = ':')
colnames(data) <- c('Feature', 'Value')
str(data)
# 'data.frame': 158 obs. of  2 variables:
#   $ Feature: num  236 24 500 163 732 234 869 117 885 106 ...
#   $ Value  : num  18943 73 19055 41 19255 ...

或者,您可以使用read.table来解析输入,而不是手动拆分稍慢但更易读的字符串。

url <- 'https://archive.ics.uci.edu/ml/machine-learning-databases/dexter/DEXTER/dexter_test.data'
input <- scan(url, what = 'character')
data <- as.data.frame(matrix(as.numeric(unlist(strsplit(input, ':'))), ncol = 2))
colnames(data) <- c('Feature','Value')
str(data)
# 'data.frame': 192449 obs. of  2 variables:
#  $ Feature: num  236 24 500 163 732 234 869 117 885 106 ...
#  $ Value  : num  79 10848 105 11018 76 ...
  

编辑:适合您的数据集。将您的特征/值对读入数据框。

{{1}}