使用XQuery提取HTML表的所有行和列(及其rowpans和colspans)

时间:2017-03-11 10:40:42

标签: xml xquery saxon

我正在尝试使用XQuery提取HTML表格的单元格中的所有值。我正在使用的查询(您可以在下面找到)给出以下结果

from keras.preprocessing.image import ImageDataGenerator
from keras.models import Sequential
from keras.layers import Convolution2D, MaxPooling2D
from keras.layers import Activation, Dropout, Flatten, Dense
from keras.callbacks import EarlyStopping
import numpy as np
from keras.preprocessing import image
from keras.utils.np_utils import probas_to_classes

model=Sequential()
model.add(Convolution2D(32, 5,5, input_shape=(28,28,3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2)))

model.add(Convolution2D(32,3,3))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2)))

model.add(Flatten())
model.add(Dense(128))
model.add(Activation('relu'))
model.add(Dropout(0.5))

model.add(Dense(2))
model.add(Activation('softmax'))

train_datagen=ImageDataGenerator(rescale=1./255,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True)
test_datagen=ImageDataGenerator(rescale=1./255)

train_generator=train_datagen.flow_from_directory(
r'F:\data\train',
target_size=(28,28),
classes=['dog','cat'],
batch_size=10,
class_mode='categorical',
shuffle=True)

validation_generator=test_datagen.flow_from_directory(
r'F:\data\validation',
target_size=(28, 28),
classes=['dog','cat'],
batch_size=10,
class_mode='categorical',
shuffle=True)

model.compile(loss='categorical_crossentropy', optimizer='rmsprop', metrics=['accuracy'])
early_stopping=EarlyStopping(monitor='val_loss', patience=2)
model.fit_generator(train_generator,verbose=2, samples_per_epoch=650, nb_epoch=100, validation_data=validation_generator, callbacks=[early_stopping],nb_val_samples=550)

json_string=model.to_json()
open(r'F:\data\mnistcnn_arc.json','w').write(json_string)
model.save_weights(r'F:\data\mnistcnn_weights.h5')
score=model.evaluate_generator(validation_generator, 1000)

print('Test score:', score[0])
print('Test accuracy:', score[1])

img_path = 'F:/abc.jpg'
img = image.load_img(img_path, target_size=(28, 28))
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)

y_proba = model.predict(x)
y_classes = probas_to_classes(y_proba)
print(train_generator.class_indices)
print(y_classes)

我不明白为什么“从属性节点节点开始的子轴永远不会选择任何东西”。

我正在使用撒克逊人。

这是查询

Warning on line 11 column 22 of queryExtractTable.xq:
  The child axis starting at an attribute node node will never select anything
Warning on line 11 column 63 of queryExtractTable.xq:
  The child axis starting at an attribute node node will never select anything
<?xml version="1.0" encoding="UTF-8"?>hello colspan rowspan

表格

declare default element namespace "http://www.w3.org/1999/xhtml";


declare function local:analyzeTable(
$table as element(table))
{
    for $r in $table//tr
        return
            for $c in $r//td
                    return (normalize-space($c), string("colspan"),
$c/@colspan//text() , string("rowspan"), $c/@rowspan//text() )

};


for $t in //table
    return
        local:analyzeTable($t)

1 个答案:

答案 0 :(得分:1)

警告由以下表达式引发:

$c/@colspan//text()

@colspan是属性节点,属性节点没有任何子节点。因此,当您要求属性的后代text()节点时,Saxon会发出警告。

要访问这些属性的字符串值,您可以将这些表达式更改为:

string($c/@colspan)

我发现您已熟悉string()功能,例如string("colspan");请注意,虽然这里的string()函数是无关紧要的,"colspan"足以构造一个文字字符串。

有关text()string()data()的更多信息,请参阅https://developer.marklogic.com/blog/text-is-a-code-smell