Map clues

After having selected a DOM node in the DOM Tree Viewer, click the right-button pop-up menu Clue Mapping to map the node to a clue or its attributes.

There are the following methods to select a DOM node in the DOM Tree Viewer:

  • Ordinary selection: Expand the DOM tree one level by another till the target node is found.
  • Reverse selection: In the browser window of the output region, just click on the target data snippets, the DOM tree will be expanded and the corresponding DOM node will be selected. By default, this function is not enabled. In order to enable it, tick the Reverse Selection checkbox on the tool bar. Thereafter, the handler for mouse clicking event is overriden by the customized one which positions the target HTML nodes on the tree.


Map clues

Except those in type of Info Clue all clues must be mapped before attribute mappings, e.g. mark mapping. The procedure implies different things for different types of clues.
For clues in type of Single Clue, mapping a DOM node to a clue implies extracting a URL from this fixed position.
For clues in other types, mapping a DOM node to a clue implies extracting URLs according to some rules from a HTML document scope which is delimited by this DOM node. The mapping operation for Marker clue will be described in the following sections. Others are described in chapter MetaStudio Senior User's Handbook#Clue Types.

After having selected a DOM node in the DOM Tree Viewer, click the right-button pop-up menu Clue Mapping>>Clue Mapping, the third level of the menu will be expanded. This level is automatically generated with clues' numbers, e.g. Clue 1. Click on one of them to map the current DOM node to this clue. The mapping status, second line in Clue Operation region on Clue Editor work board, will be changed from Node: unmapped into Node:xxx where xxx is the serial number of the DOM node.

Not all DOM nodes can be mapped to any clues in any types. In other words, only a DOM node in a specific type can be mapped to a specific clue. For example, only HTML <A> element can be mapped to a clue in type of Single Clue. On the other hand, any HTML elements can be mapped to a clue in other types. If the operator improperly performed a mapping operation, MetaStudio would complain by popping up an alert window.



Mapping marks

For a clue in type of Marker Clue, the mark value may not be input manually. Instead there is a right-button pop-up menu ClueMapping>>Marker Mapping for the operation. After having selected a text node in the DOM Tree Viewer, clicking the menu can fill the input box with the text node's value. Thereafter the mark can be changed as described in previous section. The mark matching rule may be changed accordingly. By now, the marker clue has been fully mapped.

Only a text node embraced by a HTML tag <A> can be mapped to a mark.



Exercises

Following the steps in the previous section, take the steps to map clues and their attributes as follows:

  1. Select the DOM node of No. 8545 which delimit a scope of the HTML page within which all hyper-links are for turning pages over.
  2. Map this node to Clue 1.
  3. Select the DOM node of No. 8630 which is a text node, embraced by an HTML <A> tag, with value Next Page.
  4. Map this node to the mark. You can find 8630 appears as the value of Marker Row No and "Next Page" is filled into Marker Value input box.

Note: The serial numbers of DOM nodes you got when you do this exercise might be different from those shown here because the structure of the target HTML page might be changed.

Note: You should not worry about changing of the serial numbers too much. The serial numbers are not recorded into the data schema and data and clue extraction rules so that the changes might not impact the validation of them. In most cases, after you have uploaded the data schema onto the MetaCamp server for a long time, you get different serial numbers when you download and edit it again with MetaStudio. The MetaStudio can handle the changes normally. But it is not the case all the way. If the structure of the page would have been changed greatly, MetaStudio would complain about not being able to position some properties. In this case, you must re-map the properties.