Phase 2: extract catalog

This phase is the succeeding of phase 1 or of the optional phase stated in Appendix A. In this chapter, a data schema for theme ComList_mic_en is defined. In fact, there are multiple HTML page structures for this theme. In Appendix B, one more data schema is defined to resolve the problem of "unknown data schema" encountered during extracting commodity catalogs.

To extract commodity catalogs, take the following steps:

  1. Define a data schema and data and clue extraction rules with MetaStudio against a sample page;
  2. Upload the data schema specification file and data and clue extraction instruction files;
  3. Extract data and clues with DataScraper.

Because this phase is the succeeding of that stated in Appendix A, the sample page need not be loaded manually. Instead, click right-button pop-up menu item recognize over the theme list on the Theme List work board to load a sample page which is automatically selected by MetaStudio.