我正在尝试将R中的CRAN群集包中的data.frame
转换为daisy
矩阵。我有一个包含13个分类变量的13109个观测数据集。
我有两种类型的错误,关于NA
被强制引入而且没有错过最小/最大的参数。为什么我会收到此错误?
我NA
中没有任何data.frame
个值。以下是我的数据集的信息:
> str(df4)
'data.frame': 13109 obs. of 9 variables:
$ Age : chr "55-64" "55-64" "55-64" "55-64" ...
$ Gender : chr "Female" "Female" "Male" "Male" ...
$ HouseholdIncome : chr "50k-75k" "150k-175k" "150k-175k" "150k-175k" ...
$ MaritalStatus : chr "Single" "Married" "Married" "Married" ...
$ PresenceofChildren: chr "No" "Yes" "Yes" "Yes" ...
$ HomeOwnerStatus : chr "Own" "Rent" "Rent" "Rent" ...
$ HomeMarketValue : chr "350k-500k" "500k-1mm" "500k-1mm" "500k-1mm" ...
$ Occupation : chr "White Collar Worker" "Professional" "Professional" "Professional" ...
$ Education : chr "Completed High School" "Completed College" "Completed College" "Completed College" ...
以下是强制执行NA
值的PAM
值的证据:我尝试执行NA
群集功能,但收到的错误是>library(cluster)
>#Create dissimilarity matrix
>#Gower coefficient for finding distance between mixed variable
>daisy4 <- daisy(df4, metric = "gower", type = list(ordratio = c(1:9)))
> warnings()
Warning messages:
1: In data.matrix(x) : NAs introduced by coercion
2: In data.matrix(x) : NAs introduced by coercion
3: In data.matrix(x) : NAs introduced by coercion
4: In data.matrix(x) : NAs introduced by coercion
5: In data.matrix(x) : NAs introduced by coercion
6: In data.matrix(x) : NAs introduced by coercion
7: In data.matrix(x) : NAs introduced by coercion
8: In data.matrix(x) : NAs introduced by coercion
9: In data.matrix(x) : NAs introduced by coercion
10: In min(x) : no non-missing arguments to min; returning Inf
11: In max(x) : no non-missing arguments to max; returning -Inf
12: In min(x) : no non-missing arguments to min; returning Inf
13: In max(x) : no non-missing arguments to max; returning -Inf
14: In min(x) : no non-missing arguments to min; returning Inf
15: In max(x) : no non-missing arguments to max; returning -Inf
16: In min(x) : no non-missing arguments to min; returning Inf
17: In max(x) : no non-missing arguments to max; returning -Inf
18: In min(x) : no non-missing arguments to min; returning Inf
19: In max(x) : no non-missing arguments to max; returning -Inf
20: In min(x) : no non-missing arguments to min; returning Inf
21: In max(x) : no non-missing arguments to max; returning -Inf
22: In min(x) : no non-missing arguments to min; returning Inf
23: In max(x) : no non-missing arguments to max; returning -Inf
24: In min(x) : no non-missing arguments to min; returning Inf
25: In max(x) : no non-missing arguments to max; returning -Inf
26: In min(x) : no non-missing arguments to min; returning Inf
27: In max(x) : no non-missing arguments to max; returning -Inf
28: In min(x) : no non-missing arguments to min; returning Inf
29: In max(x) : no non-missing arguments to max; returning -Inf
> k4answers <- pam(daisy4, 3, diss = TRUE)
Error in pam(daisy4, 3, diss = TRUE) :
NA values in the dissimilarity matrix not allowed.
值不允许。
.csv
如果我能提供更多信息,请告诉我。
编辑:我解决了我的错误。我在character
文件中读作#Load Data
Store4 <- read.csv("/Users/scdavis/Documents/Work/Data/Client4.csv",
na.strings = "", stringsAsFactors=FALSE, head = TRUE)
。这就是它与其他数据集一起工作的原因。这是我出错的地方:
#Load Data
Store4 <- read.csv("/Users/scdavis/Documents/Work/Data/Client4.csv",
na.strings = "", head = TRUE)
解决方案:
{{1}}
答案 0 :(得分:1)
以因子变量而不是字符的形式读取数据。
#Load Data
Store4 <- read.csv("/Users/scdavis/Documents/Work/Data/Client4.csv",
na.strings = "", head = TRUE)
之前我有过这个解决方案并且创建了一个错误。
#Load Data
Store4 <- read.csv("/Users/scdavis/Documents/Work/Data/Client4.csv",
na.strings = "", stringsAsFactors=FALSE, head = TRUE)