Question

我的数据集看起来如下所示。第一个数字是要素编号，然后是冒号，然后是与该特定要素关联的值。我不知道如何在R中导入这个数据集。任何想法？

236：24 500：163 732：234 869：117 885：106 1249：103 1280：158 1889：119 2015：55 2718：126 3307：137 3578：25 3770：26 4139：128 4723：114 4957： 82 5128：50 5420：124 5603：135 5897：34 5946：117 6069：154 6153：55 6347：87 6372：77 6666：109 6866：223 6984：39 7709：253 7950：87 8078：38 8945：141 9316 ：111 9948：103 9989：68 10276：43 10530：76 10532：55 10799：15 10802：20 10848：82 11347：16 11871：51 11883：105 12534：133 12601：13 12781：178 12798：116 12842：106 12916：7 12935：51 12968：154 13028：58 13330：105 13384：2 13568：47 13641：632 13829：18 13964：62 14385：93 14392：272 15280：140 15424：119 15492：52 15523：31 16311： 23 16464：69 16478：94 16584：102 16586：107 16705：272 17138：108 17181：150 17526：280 17540：163 18007：114 18050：53 18180：2 18806：160 18943：73 19055：41 19255：88 19774 ：59 19889：72 19921：45 101：68 572：57 732：63 962：120 1304：61 1831：60 1889：58 1973：105 2518：161 2629：228 2990：158 3147：75 3578：11 3860：88 4011：18 4623：141 4684： 411 4758：69 4820：120 6149：102 6234：134 6306：118 6866：147 6927：89 6988：51 7048：178 7193：31 7257：61 7709：229 8061：125 8202：188 8272：17 8759：165 9104 ：77 9325：135 9860：97 10055：684 10532：180 10735：64 10744：267 10820：120 10848：186 10923：128 10936：129 11203：160 11303：144 11668：87 11867：97 11871：207 12191：83 12238：193 12380：51 12968：164 13369：58 13929：39 14531：102 14800：130 14931：99 15314：91 15632：62 16165：7 16353：120 16584：137 17216：172 18372：31 18893：75 19133： 93 19154：101 19165：133 19607：20 19784：141 19889：97 19921：60

Answer 1

假设您的数据存储在input <- scan('input.txt', what = 'character') data <- as.data.frame(matrix(as.numeric(unlist(strsplit(input, ':'))), ncol = 2)) colnames(data) <- c('Feature', 'Value') str(data) # 'data.frame': 158 obs. of 2 variables: # $ Feature: num 236 24 500 163 732 234 869 117 885 106 ... # $ Value : num 18943 73 19055 41 19255 ...，

中

data <- read.table(text = input, sep = ':')
colnames(data) <- c('Feature', 'Value')
str(data)
# 'data.frame': 158 obs. of  2 variables:
#   $ Feature: num  236 24 500 163 732 234 869 117 885 106 ...
#   $ Value  : num  18943 73 19055 41 19255 ...

或者，您可以使用read.table来解析输入，而不是手动拆分稍慢但更易读的字符串。

url <- 'https://archive.ics.uci.edu/ml/machine-learning-databases/dexter/DEXTER/dexter_test.data'
input <- scan(url, what = 'character')
data <- as.data.frame(matrix(as.numeric(unlist(strsplit(input, ':'))), ncol = 2))
colnames(data) <- c('Feature','Value')
str(data)
# 'data.frame': 192449 obs. of  2 variables:
#  $ Feature: num  236 24 500 163 732 234 869 117 885 106 ...
#  $ Value  : num  79 10848 105 11018 76 ...

编辑：适合您的数据集。将您的特征/值对读入数据框。

{{1}}

读取R中的复杂数据集

1 个答案: