Map property

In order to define data extraction rules, MetaStudio should be told which data snippets should be mapped to which properties. MetaStudio provides a set of tools to facilitate the process. This page will show some basic tools.

Over the DOM Tree Viewer there is a right-button popup menu all items of which are used for mapping operations. They are classed into the following two categories:

This chapter describes how to map properties only. Mapping operations for clue extraction are described in next chapter.

If a whole list on a target page will be extracted, two replicas should be defined and mapping operations should be performed twice. It should also be paid attention that the two rows, the data snippets mapped to two replicas respectively, of the list should be the FIRST and the SECOND. Otherwise, all rows in the list would not be covered according to the defined extraction rules. This section describes how to map data snippets to the primary replica. Next section will describe mapping the second replica.

In the DOM Tree Viewer, select a DOM node firstly. Then click mouse's right button to popup the menu Property. Go to the second level and click on the item named by the property. The value of the DOM node will be mapped to the property. As a result the column node of the row in the property mapping table is filled with the serial number of the DOM node.

There are the following methods to select a DOM node in the DOM Tree Viewer:

  • Ordinary selection: Expand the DOM tree one level by another till the target node is found.
  • Reverse selection: In the browser window of the output region, just click on the target data snippets, the DOM tree will be expanded and the corresponding DOM node will be selected. By default, this function is not enabled. In order to enable it, tick the Reverse Selection checkbox on the tool bar. Thereafter, the handler for mouse clicking event is overriden by the customized one which positions the target HTML nodes on the tree.

Note: Nesting of HTML nodes may impact precision of positioning by reversion selection. In some cases the found node may be an ancestor of the target node. In this case, the user must make sure the node is the wanted. MetaStudio provides such a convenient tool helping the user to verify it that he just watches whose border flashes in red for three times.

Note: During flashing of one border, if another DOM node is selected in the DOM Tree Viewer, the border of the former might be frozen in red. That is a recorded bug which can be avoided just waiting the former finishes flashing. If it happens in accident, it doesn't prevent MetaStudio from working properly.



Exercises

Following the steps in previous chapter, map company information of the first row of the company list to the properties respectively. The following node serial numbers would be filled into the column node.

Node Property Name Key Clue Url Block Null
1713 name Validation & Data No No No No
1712 company page Validation & Data Yes Yes No No
1717 introduction Validation & Data No No No No
1726 business type No No No No No
1893 register date No No No No No
1881 register capital No No No No No
1867 credit No No No No No

Note: The serial numbers of DOM nodes you got when you do this exercise might be different from those shown here because the structure of the target HTML page might be changed.

Note: You should not worry about changing of the serial numbers too much. The serial numbers are not recorded into the data schema and data and clue extraction rules so that the changes might not impact the validation of them. In most cases, after you have uploaded the data schema onto the MetaCamp server for a long time, you get different serial numbers when you download and edit it again with MetaStudio. The MetaStudio can handle the changes normally. But it is not the case all the way. If the structure of the page would have been changed greatly, MetaStudio would complain about not being able to position some properties. In this case, you must re-map the properties.