我是R的新手,所以我可能无法找到正确的搜索字词,所以会喜欢任何方向。
我需要解析一个非常不规则的,非常大的.csv。单个列包含10类分类数据,后跟14列,其中可能包含一个或多个数字记录值。这个.csv是手工格式化的,看起来像一个数据透视表,所以我的每个记录之间有不同的行数。这些数据的输入方式存在很多不一致之处。这只是一个很小的片段。
Categories x1 x2 x3
12123 222.0 206.7 236.7
Novartis Seeds 222.0
N67-T4 220.8
4/19/2000 220.8
32000 220.8
Soybean 220.8
Y 220.8
No-Till 220.8
N7070BT 223.2
4/19/2000 223.2
32000 223.2
Soybeans 223.2
Y 223.2
No-Till 223.2
Syngenta 206.7 236.7
N68-K7 236.7
4/24/2002 236.7
36500 236.7
Soybeans 236.7
Y 236.7
No-Till 236.7
NX7210 206.7
5/8/2001 206.7
38000 206.7
Corn 206.7
Y 206.7
No-Till 206.7
我想我已经想出了一个系统,因为(虽然我一行一行地阅读,我已经看到提到的是在R中编码效率最低的方式):
#yc is my data table.
#This function was designed to identify character strings in the only category of data (tillage record)
#which would be consistantly associated with a single record (yield). I create a new column of 0's and 1's, where
#1's are associated with a single record
tillTFfn<-function(yc){
yc$tillTF<-rep(NA, length(yc$Categories))
for (i in 1:length(yc$Categories))
if (grepl("till", yc$Categories[i], ignore.case=TRUE)==TRUE){
yc$tillTF[i]<-1
} else if (grepl(" Minimum-Till ", yc$Categories[i], ignore.case=TRUE)==TRUE){
yc$tillTF[i]<-1
} else if (grepl("Conv", yc$Categories[i], ignore.case=TRUE)==TRUE){
yc$tillTF[i]<-1
} else if (grepl("Not", yc$Categories[i], ignore.case=TRUE)==TRUE){
yc$tillTF[i]<-1
} else {
yc$tillTF[i]<-0}
return(yc)
}
YC<-tillTFfn(yc)
#I then create another new column with the sum of all records reported in each row. Values in "colsum" which coincide with
#"tillTF" are my record
YC$colsums<-rowSums(YC[,2:16], na.rm=TRUE)
#now I'm attempting to create a function that reads YC row by row, and returns a row value with each categorical variable
array<-rep(0, length(12))
#The assumption here is that the first 10 rows of column 1 will contain a single value of each category
for (i in 1:12))
if(YC$tillTF[i]==1){
array1[12]<-(YC$colsums[i])
array1[11]<-(YC$Categories[i])
array1[10]<-(YC$Categories[i-1])
array1[9]<-(YC$Categories[i-2])
array1[8]<-(YC$Categories[i-3])
array1[7]<-(YC$Categories[i-4])
array1[6]<-(YC$Categories[i-5])
array1[5]<-(YC$Categories[i-6])
array1[4]<-(YC$Categories[i-7])
array1[3]<-(YC$Categories[i-8])
array1[2]<-(YC$Categories[i-9])
array1[1]<-(YC$Categories[i-10])
}
#This is my imaginary way to create a data table where the array created above is the first row of a new data table YC_NT
YC_NT<-rbind(array)
#This is my imaginary function for the remainder of YC. The idea is that the loop will run through each row of YC, stop
#when YC$tillTF = 1, rewrite values of the array by reading back up through the column until YC$tillTF=1 again, and then print
#that array magically as a row on the new data table YC_NT
for (i in 13:length(YC$tillTF))
if (YC$tillTF[i]=1)
array[12]<-(YC$colsums[i])
array[11]<-(YC$Categories[i])
if (YC$tillTF[i-1]==0)
array[10]<-YC$Categories[i-1]
else
rbind(array, YC_NT)
if (YC$tillTF[i-2]==0)
array[9]<-YC$Categories[i-2]
else
rbind(array, YC_NT)
if(YC$tillTF[i-3]==0)
array[8]<-YC$Categories[i-3]
else
rbind(array, YC_NT)
if(YC$tillTF[i-4]==0)
array[7]<-YC$Categories[i-4]
else
rbind(array, YC_NT)
if(YC$tillTFF[i-5]==0)
array[6]<-YC$Categories[i-5]
else
rbind(array, YC_NT)
if(YC$tillTFF[i-6]==0)
array[5]<-YC$Categories[i-6]
else
rbind(array, YC_NT)
if(YC$tillTFF[i-7]==0)
array[4]<-YC$Categories[i-7]
else
rbind(array, YC_NT)
if(YC$tillTFF[i-8]==0)
array[3]<-YC$Categories[i-8]
else
rbind(array, YC_NT)
if(YC$tillTFF[i-9]==0)
array[2]<-YC$Categories[i-8]
else
rbind(array, YC_NT)
else
array<-array
return(YC_NT)
#I recognize that my parenthesis and brackets aren't in yet, and once again that this is not how rbind() works.
1)我可以在R中执行嵌套条件语句,就像我在这里完成的那样吗? 2)是否有一个函数可以用来将矢量作为一行打印到数据表而不先命名单个矢量并将其键入rbind(apply()似乎没有工作)