在r中拆分数据

时间:2015-10-14 03:35:53

标签: r split binary multiple-columns

我正面临以下问题。 我有一个大型数据集,我使用split来使数据更加平易近人。我结束了大约250次分裂。结果,每个拆分被命名为“二进制代码+原始名称”。有没有办法在没有r自动添加二进制代码的情况下编写新数据集?

以下是一个可重现的例子:

df_NA <- data.frame(Size= c(800, 850, NA, 1200, NA),
Price =     c(900, NA, 1300, 1100, 1200),
Location =  c(NA, 'Downtown', 'Uptown', NA, 'Lakeview'),
Rooms =     c(1, 2, NA, 4, NA),
Bathrooms = c(1, 2, 1, 2, 2),
Rent =      c('Yes', 'Yes', 'No','Yes', 'No'))

下面我分割数据(最后有三个不同的集合),将它们写入我的Splits文件夹,然后删除空列并将其写入我的Updated Splits文件夹。

# Splitting
index <- apply(is.na(df_NA)*1, 1,paste, collapse = "")
s <- split(df_NA, index)
# Writing splits into csv files and removing empty columns
for (i in 1:length(s))
{
write.csv(s[i], file = paste0("Splits/", i, "splits.csv"),
row.names=FALSE, na = "")
sdf <- data.frame(s[i])
updated_split <- sdf[,colSums(is.na(sdf))<nrow(sdf)]
write.csv(updated_split, file = paste0("Updated Splits/","updated", i, "split.csv"), row.names=FALSE)
}

现在,当我打开三个中的随机文件时,我明白了:

data <- read.csv("Updated Splits/updated1split.csv")
data
  X001000.Size X001000.Price X001000.Rooms X001000.Bathrooms X001000.Rent
1          800           900             1                 1          Yes
2         1200          1100             4                 2          Yes

我尝试了col.names=F,但它没有改变任何东西。知道如何绕过它吗?也许在我写文件后有一种方法可以删除所有二进制名称?

1 个答案:

答案 0 :(得分:2)

 $sql = "SELECT Disease, 
                Score
         FROM
               (SELECT d.DiseaseID, 
                       d.Disease,
                       Count(*) As score
                FROM Diseases d
                INNER JOIN DiseaseSymptomJoin ds
                        ON d.DiseaseID = ds.DiseaseID 
                INNER JOIN Symptoms s
                        ON ds.SymptomID = s.SymptomID
                WHERE s.Symptom 
                    IN (".$symptom1.", ".$symptom2.", ".$symptom3.", ".$symptom4.")
                GROUP BY d.DiseaseID,
                         d.Disease
         ) As dT
         ORDER BY score DESC";