MetaStudio Overview

MetaStudio is a tool to describe data schemas of target Web pages, which provides users a series of facilities, i.e. control elements on the GUI, to model a page's data schema or called as data structure, to validate the data schema, to calculate for Data and Clue Extraction Rules(DCERs) according to the valid schema, to store the rules into Data and Clue Extraction Instruction Files(DCEIF) which are fed into the data extraction engine, i.e. DataScraper.

To leverage its power, MetaStudio should work with MetaCamp server which is deployed as an application in Apache Tomcat. On the other hand, MetaStudio can run alone despite all online features, e.g. uploading data schemas onto server and sharing data schemas with other users etc., are inhibited. How MetaStudio is deployed is shown on page MetaSeeker's Networking.

MetaStudio is one of the four tools in MetaSeeker toolkit.



Advantages

  • MetaStudio is transparent to Web pages' authoring methods. That is it manipulates all Web pages in a consistent manner despite the pages are authored with HTML, PHP, JSP, ASP, ASPX etc.
  • MetaStudio is very adaptive for defining data schema and generating data extraction rules for most of forums, blogs, yellow pages, product or business lists etc. Otherwise users should code a lot of HTML wrappers for every sites even for every channels or columns within a site without the help of MetaStudio.
  • MetaStudio generates data extraction rules automatically according to directions from users via friendly GUI. Users will never experience in pain to code a lot of particular HTML wrappers.
  • The operation of MetaStudio is straight-forward, it costs a user only minutes to define data schema of a group of Web pages, without including the time for the user to understand the data structure of a specific sample Web page.
  • MetaStudio provides many validation facilities which can help users to find if the defined data schema is precise and if the generated data extraction rules can work as users expect. As a result the procedures of defining data schema and validating it can be taken simultaneously and interleaved, which can shorten the time to finish the work.
  • MetaStudio provides many monitoring facilities. Data schema definition procedures are totally under control.


Resources

  1. If you want to know how to deploy MetaStudio, please visit MetaSeeker Installation Guide;
  2. If you want to learn how to operate MetaStudio, please visit MetaStudio User's Guide;
  3. If you are trying to extract product list or yellow pages, please follow the steps shown on page MetaSeeker Cook Book#Scenario 1 and Scenario 2;
  4. If you want to learn more inside MetaSeeker, please visit Inside MetaSeeker.