解析Excel公式

时间:2014-01-10 09:44:31

标签: python r excel

如果这是漫长的啰嗦,请提前道歉。 我需要模仿一组特定的Excel电子表格的功能。这需要在R中完成。 获取数据等相对简单。我现在需要在给定的工作表中实现公式。 像XLConnect这样的包能够以字符串格式提取公式。对于特定工作表,数据如下:

exForm <- structure(list(r = c("A2", "B2", "A3", "B3", "A4", "B4", "A5", 
"B5", "A6", "B6", "A7", "B7"), formulae2 = c("1", "A2", "A2+1", 
"SUM(A$2:A3)", "A3+1", "SUM(A$2:A4)", "A4+1", "SUM(A$2:A5)", 
"A5+1", "SUM(A$2:A6)", "A6+1", "SUM(A$2:A7)"), x = c("2", "2", 
"3", "3", "4", "4", "5", "5", "6", "6", "7", "7"), y = c("1", 
"2", "1", "2", "1", "2", "1", "2", "1", "2", "1", "2")), .Names = c("r", 
"formulae2", "x", "y"), class = "data.frame", row.names = c(NA, 
-12L))

#> exForm
#    r   formulae2 x y
#1  A2           1 2 1
#2  B2          A2 2 2
#3  A3        A2+1 3 1
#4  B3 SUM(A$2:A3) 3 2
#5  A4        A3+1 4 1
#6  B4 SUM(A$2:A4) 4 2
#7  A5        A4+1 5 1
#8  B5 SUM(A$2:A5) 5 2
#9  A6        A5+1 6 1
#10 B6 SUM(A$2:A6) 6 2
#11 A7        A6+1 7 1
#12 B7 SUM(A$2:A7) 7 2

有一些巧妙的python代码实现了原始的javascript代码here。 我已使用rPython将数据传递给这组函数。

library(rPython)
excelURL <- "http://www.ewbi.com/ewbi.develop/samples/jsport_nonEAT.py"
download.file(excelURL, "excel.py")
python.exec("execfile('excel.py')")
python.assign("test", exForm$formulae2)
python.exec('t=[]
for i in range(len(test)):
\t p.parse(test[i])
\t t.append(p.prettyprint())
')
parseForm <- python.get('t')

如果人们没有rPython提交parseForm的输出

c("1 <operand> <number>\n", "A2 <operand> <range>\n", "A2 <operand> <range>\n+ <operator-infix> <math>\n1 <operand> <number>\n", 
"SUM <function> <start>\n    A$2:A3 <operand> <range>\n <function> <stop>\n", 
"A3 <operand> <range>\n+ <operator-infix> <math>\n1 <operand> <number>\n", 
"SUM <function> <start>\n    A$2:A4 <operand> <range>\n <function> <stop>\n", 
"A4 <operand> <range>\n+ <operator-infix> <math>\n1 <operand> <number>\n", 
"SUM <function> <start>\n    A$2:A5 <operand> <range>\n <function> <stop>\n", 
"A5 <operand> <range>\n+ <operator-infix> <math>\n1 <operand> <number>\n", 
"SUM <function> <start>\n    A$2:A6 <operand> <range>\n <function> <stop>\n", 
"A6 <operand> <range>\n+ <operator-infix> <math>\n1 <operand> <number>\n", 
"SUM <function> <start>\n    A$2:A7 <operand> <range>\n <function> <stop>\n"
)

所以我现在收集parseForm包含原始公式的标记化版本。我会在R中使用 将parseForm中的表单呈现为R表达式?

> ex1 <- expression(1 + A1)
> A1 <- 10
> eval(ex1)
[1] 11

有没有人遇到类似的事情,或者有人指出我正确的方向。有关其他信息,我将工作表中的数据表示为数据框,例如sheet1,以上所指的是:

wData <- structure(c(NA, 1, 2, 3, 4, 5, 6, NA, 1, 3, 6, 10, 15, 21), .Dim = c(7L, 
                                                                              2L))    

1 个答案:

答案 0 :(得分:1)

我认为这个想法是输入是exForm,输出应该是wData

以下内容可以进一步概括,但足以满足问题中的示例。请注意,它假设exForm中的任何单元格仅指向其上方的单元格(这种情况就是这种情况,在大多数情况下可能就是这种情况),以便我们可以线性地沿着exForm行继续行进

ss删除提供x1的美元符号,然后将A2:A4等字符串转换为wData[2:4, "A"] x2,然后转换嵌入A2的字符串和嵌入式wData[2, "A"]之类的字符串等。剩下的东西可以解析为R,我们这样做。 for循环从顶部应用exFormss中定义的每个单元格线性遍历exForm。不需要插件包。

SUM <- sum

ss <- function(x) {
       x1 <- gsub("$", "", x, fixed = TRUE)
       x2 <- gsub("([[:alpha:]]+)([[:digit:]]+):\\1([[:digit:]]+)",
        "wData[\\2:\\3, '\\1']", x1, perl = TRUE)
       x3 <- gsub("([[:alpha:]]+)([[:digit:]]+)", "wData[\\2, '\\1']", x2)
       eval.parent(parse(text = x3))
    }

wData <- matrix(NA, nr = max(as.numeric(exForm$x)), nc = max(as.numeric(exForm$y)))
colnames(wData) <- LETTERS[1:ncol(wData)]

for(i in 1:nrow(exForm)) {
    x <- as.numeric(exForm$x[i])
    y <- as.numeric(exForm$y[i])
    wData[x, y] <- ss(exForm$formulae2[i])
}

,并提供:

> wData
      A  B
[1,] NA NA
[2,]  1  1
[3,]  2  3
[4,]  3  6
[5,]  4 10
[6,]  5 15
[7,]  6 21

已修订进行了多项更正和简化。