Question

我有一个保存在名为＆＃39; extremes＆＃39;的csv文件中的数据集。（30列和2000行）。我执行聚类分析，并使用capture.output来保存在csv文件中输出。具体来说，我这样做：

    capture.output(inf,file="Clusters.csv", append=TRUE)

其中＆＃39; inf＆＃39;是一个返回分析输出的函数。＆＃39; inf＆＃39;是一个清单。

我保存在csv文件中的输出（名为＆＃39; Clusters.csv＆＃39;）如下（因为它出现在R控制台中）：

$assign
 [1] 1 2 3 1 1 1 1 2 1 4 1 4 1 2 4 2 3 5 4 1 2 2 2 1 1 1 1 1 1 1

$list
$list$cluster.1
 [1]  1  4  5  6  7  9 11 13 20 24 25 26 27 28 29 30

$list$cluster.2
[1]  2  8 14 16 21 22 23

$list$cluster.3
[1]  3 17

$list$cluster.4
[1] 10 12 15 19

$list$cluster.5
[1] 18


$num
cluster.1 cluster.2 cluster.3 cluster.4 cluster.5 
   16         7         2         4         1

根据分析，我还得到一个名为＆＃39; NumberClusters＆＃39;表示最佳簇数（对于此特定数据集，它取值为2）。

我想要实现的是阅读csv文件中的特定列＆＃39; extremes＆＃39;组成第一个集群（即1 4 5 6 7 9 11 13 20 24 25 26 27 28 29 30）并将它们保存在data.frame中（并可能将它们存储在名为＆＃39; Cluster1＆＃的csv文件中39;然后从组成第二个集群的csv文件＆＃39; extremes＆＃39中读取特定列（即，2 8 14 16 21 22 23）并将它们保存在data.frame中（也可能在csv文件命名为＆＃39; Cluster2＆＃39;）。然后我可以使用两个数据集＆＃39; Cluster1＆＃39;和Cluster2＆＃39;继续我的分析。我认为，我的主要问题是从文件中找到一种方法来读取构成每个簇的列（例如，对于簇1，列：1 4 5 6 7 9 11 13 20 24 25 26 27 28 29 30）＆＃39; Clusters.csv＆＃39 ;.我相信我将能够在文件＆＃39; extremes.csv＆＃39;中读取这些列中包含的数据。使用

read.xls("extremes.csv")[c(1  4  5  6  7  9 11 13 20 24 25 26 27 28 29 30])

我也尝试过使用包＆＃39; xlsx＆＃39;但没有取得任何成果。

任何帮助将不胜感激，因为我已经坚持了一段时间了。

我的数据看起来像这样（这是一个小样本;实际上我有30列（财务指数）和2019行（每日回报）。我希望这有帮助。

Food    Beer    Smoke   Games   Books   Hshld   Clths
0.57    1.23    1.19    0.54    -0.19   0.31    0.52
0.48    0.57    -0.89   -0.23   -0.25   0.29    -0.26
-0.55   -0.75   -0.8    -0.41   -0.2    -0.29   -0.61
 0.6    -0.1    0.31    1.16    1.14    0.74    0.72
-0.44   -1.34   -1.73   -0.16   0.22    -0.97   -0.96
-0.25   -0.21   -0.07   -0.73   -0.4    -0.56   -0.8
0.11    -0.94   -0.3    -0.38   -0.07   -0.38   -0.24
-1.34   -2.12   -1.54   -1.52   -0.68   -1.72   -1.91

我运行你的代码（你的模拟示例），我得到了

> cluster1
Null data.table (0 rows and 0 cols)

对于cluster2也是如此。

然后我使用我的数据集运行以下命令并获得相同的消息（即Null data.table（0行和0列）。

output <- read.csv("Clusters.csv", header = TRUE)
output <- list()
cluster.data <- matrix(extremes, nrow = 2019, ncol = 30, byrow = TRUE) 
DT <- as.data.table(cluster.data)
cluster1 <- DT[, c(output$list$cluster1), with = FALSE]
cluster1
cluster2 <- DT[, c(output$list$cluster2), with = FALSE]
cluster2

我怀疑我完全错了。

我运行没有输出＆lt; -list（）的代码。那就是：

编辑：我认为这是因为我们没有正确使用output$list$cluster2名称。试试output$list$cluster.2。我在下面做了更改。请尝试：

output <- read.csv("Clusters.csv", header = TRUE)
# take a look at output
output

cluster.data <- matrix(extremes, nrow = 2019, ncol = 30, byrow = TRUE) 
DT <- as.data.table(cluster.data)
cluster1 <- DT[, c(output$list$cluster.1), with = FALSE]
cluster1
cluster2 <- DT[, c(output$list$cluster.2), with = FALSE]
cluster2

编辑：我们快到了！请尝试打印output和output$list$cluster.1以及str(output$list$cluster.2)，看看它是如何分类的。最后，如果这不起作用，请在dput上使用output对文件进行操作，并在记事本/文本编辑器中查看它。 dput将数据写入R命令以重新创建。发布它，以便我们检查输出。

Answer 1

没有你的数据块它有点棘手。如果您不熟悉此软件包，请查看data.table cheatsheet。

假设您的列是标准列，因此不要使用名称V1 V2。让我们隔离你的两个块，这样你就可以将它们保存下来。

library(data.table)

# mini mockup example using just first 5 columns
output <- list()
output$list$cluster.1 <- c(1,4,5)
output$list$cluster.2 <- c(2)
# EDIT: Kostas you would do this with your data
#  "output I save in the csv file (called 'Clusters.csv')"
# get the output structure back
# output <- read.csv("Clusters.csv", header = TRUE)
# Then the code will read your list results

# mockup of your data using a to e so we can see how columns selected
#   its simply two lines of repeated a b c d e
cluster.data <- matrix(letters[1:5], nrow = 2, ncol = 5, byrow = TRUE) 

#assuming you want the column names will just be default V1 V2...
#  cluster 1 we would expect it to look like this
#  headings     V1 V4 V5
#  data         a d e 
#  data line 2  a d e 


# turn it into a data.table
#   you would read your data in as csv 
#   data <- as.data.table(read.csv("yourfile.csv")) etc.
DT <- as.data.table(cluster.data)

# subset data to cluster 1
cluster1 <- DT[, c(output$list$cluster.1), with = FALSE]

   V1 V4 V5
1:  a  d  e
2:  a  d  e

# likewise for 2
cluster2 <- DT[, c(output$list$cluster.2), with = FALSE]

   V2
1:  b
2:  b

注意我在data.table中使用with = FALSE，以便调用第4列而不是名为4的列。

然后，您将保存这些块。请参阅'write.table'或'write.csv'。在提示时输入?write.table以获得帮助。

您可以使用以下方法对不同的群集长度进行“参数化”：当{i = 3

时，as.name(paste0("cluster.", as.character(i)))获取cluster.3

希望这有帮助！

LATER EDIT ：Kostas我看到你输出的数据现在称为cluster.1而不是cluster1，因为我原来已经编辑了上面的代码。 $list$cluster.1

从列表中读取数字并使用与读取的数字对应的列创建csv文件

1 个答案: