org.apache.nutch.parse
Interface HtmlParseFilter
- All Superinterfaces:
- org.apache.hadoop.conf.Configurable, Pluggable
- All Known Implementing Classes:
- CCParseFilter, HTMLLanguageParser, JSParseFilter, RelTagParser
public interface HtmlParseFilter
- extends Pluggable, org.apache.hadoop.conf.Configurable
Extension point for DOM-based HTML parsers. Permits one to add additional
metadata to HTML parses. All plugins found which implement this extension
point are run sequentially on the parse.
Methods inherited from interface org.apache.hadoop.conf.Configurable |
getConf, setConf |
X_POINT_ID
static final String X_POINT_ID
- The name of the extension point.
filter
ParseResult filter(Content content,
ParseResult parseResult,
HTMLMetaTags metaTags,
DocumentFragment doc)
- Adds metadata or otherwise modifies a parse of HTML content, given
the DOM tree of a page.
Copyright © 2006 The Apache Software Foundation