Data Schema Recognition Rule File

Data Schema Recognition Rule File, also called as DSD file, verifies if the structures of the target pages match any data schemas belonging to current theme. The files in this type are stored in DataStore server's folder $CATALINE/work/DataStore/context/extraction/config/<theme-name>/. The names of these files are suffixed with .dsd.xml. The structure of them is shown as follows:

<?xml version="1.0" encoding="gbk"?>
<geometa-data-schema>
<theme>food_industry_category</theme> <!-- theme name -->
<gem>food_industry_category.gem.xml</gem> <!-- GEM file -->
<sce>food_industry_category.sce.xml</sce> <!-- SCE file -->
<exist> <!-- a validation rule expressed in XPath. exist denotes a rule for existing checking. -->
<path from="HTML"><![CDATA[count(//*[@id='ul-id-blue']/li/p[count(./a/text())>0])>0]]>
<context>//*[@id='blueFrame']</context>
</path>
</exist>
</geometa-data-schema>

Where:

  • Element path's attribute from can take one of the following values:
    • HTML: means the validation is performed against HTML pages directly.
    • transDOM_xxx: means the validation is performed against intermediate DOMs. xxx is the serial number of the DOM, which corresponds to the serial No. of a MAP file belonging to current theme.