我有一个xlsx
文件,其中包含一些值为-3的单元格。一些是单个细胞,一些是具有-3值的连续细胞。我正在尝试编写一个R
脚本,它找到包含-3的这些单元格的索引,这样对于单个单元格-3值,我得到单个索引,对于连续单元格-3值,我得到了起始和结束指数。
以下matrix
文件中的xlsx
,其中包含20 columns
和2 rows
3.203 3.204 3.205 -3 3.207 3.207 -3 -3 -3 3.206 3.208 3.207 -3 3.264 3.207 3.208 -3 -3 3.209 -3
3.205 3.205 3.205 3.21 3.208 3.208 3.209 -3 -3 3.209 3.211 3.21 3.211 3.211 3.21 -3 3.213 3.211 3.212 3.212
我希望结果看起来像这样(我将-3视为缺失值)。所以,
1 missing value at: ( 1 , 4 )
3 missing values starting from: ( 1 , 7 ) to ( 1 , 9 )
1 missing value at ( 1 , 13 )
2 missing values starting from: ( 1 , 17) to ( 1 , 18 )
1 missing value at: ( 1, 20 )
2 missing values starting from: ( 2 , 8 ) to ( 2 , 9 )
1 missing value at: ( 2, 16 )
这是R脚本,但它给了我错误的结果。我对正确使用索引感到困惑。
fileData <- read.xlsx(filePath, 1, header = FALSE, sep = ",")
dataMatrix <- data.matrix(fileData)
## Find the number of rows and columns in the matrix
numberOfRows <- nrow(dataMatrix)
numberOfColumns <- ncol(dataMatrix)
## Access each value of the dataMatrix, check if it -3
for (i in 1:numberOfRows) # for each row
{
# Get indexes for -3 value
missingValueList = which(dataMatrix[i,] == -3);
# Find the index after which there is a break (so no consecutive value)
consecutiveBreaks = which(diff(missingValueList) != 1);
print(missingValueList)
print(consecutiveBreaks)
j=0;
for(k in 1:length(consecutiveBreaks))
{
if(k == 1)
{
cat(consecutiveBreaks[k], " missing value at: (",i,",",missingValueList[j+k],")","\n");
}
else
{
cat("Value of k: ", k, "\n");
cat(abs(consecutiveBreaks[k]-consecutiveBreaks[k-1]), " missing values starting from: (",i,",",missingValueList[j],")","\n");
}
j=j+1;
}
}
有人可以帮助我找到理想的解决方案吗?
答案 0 :(得分:1)
你走了。我认为这应该适用于您的数据:
val = 1;
counter = 1;
temp = matrix();
for (i in 1:nrow(mdata))
{
for (j in 1:ncol(mdata))
{
if (mdata[i,j] == -3)
{
while (j <= ncol(mdata))
{
if (mdata[i,j + val] == -3)
{
counter = counter + 1;
val = val + 1;
next;
}
else
{
break;
}
}
if (counter == 1)
{
#print(j);
#print(mdata[i, (j - 1):(j + 1)]);
temp <- t(as.matrix(mdata[i, (j - 1):(j + 1)]))
cat("\n This is with counter 1 \n")
print(temp)
cat("\n matrix: temp-1", temp[,1],"temp-2", temp[,3],"\n");
to.avg <- c(temp[,1], temp[,3]);
avg<-mean(to.avg)
mdata[i,j] = avg;
}
else
{
temp <- t(as.matrix(mdata[i,(j - 1):(j + counter)]))
cat("\n This is with multiple count \n")
cat(counter,"consecutive values were found, processing accordingly \n")
print(temp);
for (k in 0:(counter-1))
{
# cat("\n reading temp at the start \n")
# print(temp)
cat("\n K is ",(k+1), "and array is",length(temp),"long \n")
to.avg <- c(temp[,(k+1)], temp[,length(temp)]);
cat("averaging", temp[,(k+1)],"and", temp[,length(temp)]);
avg<-mean(to.avg)
cat("\n average =",avg);
temp[,(k+2)] = avg;
# cat("\n reading temp as this \n")
# print(temp)
mdata[i,j+k]=avg
}
}
}
else
{
mdata[i,j] = mdata[i,j];
}
val = 1;
counter = 1;
}
}