@DefaultKey(value="htmlTool")
public class HtmlTool
extends org.apache.velocity.tools.generic.SafeConfig
The methods utilise CSS selectors to refer to specific elements for manipulation.
Modifier and Type | Class and Description |
---|---|
static interface |
HtmlTool.ExtractResult
A container to carry element extraction results.
|
static interface |
HtmlTool.IdElement
Representation of a HTML element with ID and a text content.
|
static class |
HtmlTool.JoinSeparator
Enum indicating separator handling strategy for document partitioning.
|
Constructor and Description |
---|
HtmlTool() |
Modifier and Type | Method and Description |
---|---|
String |
addClass(String content,
String selector,
List<String> classNames)
Adds given class names to the elements in HTML.
|
String |
addClass(String content,
String selector,
List<String> classNames,
int amount)
Adds given class names to the elements in HTML.
|
String |
addClass(String content,
String selector,
String className)
Adds given class to the elements in HTML.
|
static List<String> |
concat(List<String> elements,
String text,
boolean append)
Utility method to concatenate a String to a list of Strings.
|
protected void |
configure(org.apache.velocity.tools.generic.ValueParser values) |
String |
ensureHeadingIds(String content,
String idSeparator)
Transforms the given HTML content by adding IDs to all heading elements (
h1-6 ) that
do not have one. |
HtmlTool.ExtractResult |
extract(String content,
String selector,
int amount)
Extracts HTML elements from the main HTML content.
|
String |
fixTableHeads(String content)
Fixes table heads: wraps rows with
<th> (table heading) elements into <thead>
element if they are currently in <tbody> . |
List<String> |
getAttr(String content,
String selector,
String attributeKey)
Retrieves attribute value on elements in HTML.
|
String |
headingAnchorToId(String content)
Transforms the given HTML content by moving anchor (
<a name="myheading"> ) names to
IDs for heading elements. |
List<? extends HtmlTool.IdElement> |
headingTree(String content)
Reads all headings in the given HTML content as a hierarchy.
|
static org.jsoup.nodes.Element |
parseBodyFragment(String content)
A generic method to use jsoup parser on an arbitrary HTML body fragment.
|
String |
remove(String content,
String selector)
Removes elements from HTML.
|
String |
reorderToTop(String content,
String selector,
int amount)
Reorders elements in HTML content so that selected elements are found at the top of the
content.
|
String |
reorderToTop(String content,
String selector,
int amount,
String wrapRemaining)
Reorders elements in HTML content so that selected elements are found at the top of the
content.
|
String |
replace(String content,
String selector,
String replacement)
Replaces elements in HTML.
|
String |
replaceAll(String content,
Map<String,String> replacements)
Replaces elements in HTML.
|
String |
setAttr(String content,
String selector,
String attributeKey,
String value)
Sets attribute to the given value on elements in HTML.
|
static String |
slug(String input)
Creates a slug (latin text with no whitespace or other symbols) for a longer text (i.e.
|
static String |
slug(String input,
String separator)
Creates a slug (latin text with no whitespace or other symbols) for a longer text (i.e.
|
List<String> |
split(String content,
String separatorCssSelector)
Splits the given HTML content into partitions based on the given separator selector.
|
List<String> |
split(String content,
String separatorCssSelector,
HtmlTool.JoinSeparator separatorStrategy)
Splits the given HTML content into partitions based on the given separator selector.The
separators are either dropped or joined with before/after depending on the indicated
separator strategy.
|
List<String> |
split(String content,
String separatorCssSelector,
String separatorStrategy)
Splits the given HTML content into partitions based on the given separator selector.
|
List<String> |
splitOnStarts(String content,
String separatorCssSelector)
Splits the given HTML content into partitions based on the given separator selector.
|
List<String> |
text(String content,
String selector)
Retrieves text content of the selected elements in HTML.
|
String |
wrap(String content,
String selector,
String wrapHtml,
int amount)
Wraps elements in HTML with the given HTML.
|
protected void configure(org.apache.velocity.tools.generic.ValueParser values)
configure
in class org.apache.velocity.tools.generic.SafeConfig
SafeConfig.configure(ValueParser)
public List<String> split(String content, String separatorCssSelector)
content
- HTML content to splitseparatorCssSelector
- CSS selector for separators.split(String, String, JoinSeparator)
public List<String> splitOnStarts(String content, String separatorCssSelector)
Note that the first part is removed if the split was successful. This is because the first part does not include the separator.
content
- HTML content to splitseparatorCssSelector
- CSS selector for separatorssplit(String, String, JoinSeparator)
public List<String> split(String content, String separatorCssSelector, String separatorStrategy)
content
- HTML content to splitseparatorCssSelector
- CSS selector for separatorsseparatorStrategy
- strategy to drop or keep separators, one of "after", "before" or "no"split(String, String, JoinSeparator)
public List<String> split(String content, String separatorCssSelector, HtmlTool.JoinSeparator separatorStrategy)
Note that splitting algorithm tries to resolve nested elements so that returned partitions are self-contained HTML elements. The nesting is normally contained within the first applicable partition.
content
- HTML content to splitseparatorCssSelector
- CSS selector for separatorsseparatorStrategy
- strategy to drop or keep separatorspublic String reorderToTop(String content, String selector, int amount)
content
- HTML content to reorderselector
- CSS selector for elements to bring to top of the contentamount
- Maximum number of elements to reorderpublic String reorderToTop(String content, String selector, int amount, String wrapRemaining)
content
- HTML content to reorderselector
- CSS selector for elements to bring to top of the contentamount
- Maximum number of elements to reorderwrapRemaining
- HTML to wrap the remaining (non-reordered) partpublic HtmlTool.ExtractResult extract(String content, String selector, int amount)
content
- HTML content to extract elements fromselector
- CSS selector for elements to extractamount
- Maximum number of elements to extractpublic String setAttr(String content, String selector, String attributeKey, String value)
content
- HTML content to set attributes onselector
- CSS selector for elements to modifyattributeKey
- Attribute namevalue
- Attribute valuepublic List<String> getAttr(String content, String selector, String attributeKey)
content
- HTML content to read attributes fromselector
- CSS selector for elements to findattributeKey
- Attribute namepublic String addClass(String content, String selector, List<String> classNames, int amount)
content
- HTML content to modifyselector
- CSS selector for elements to add classes toclassNames
- Names of classes to add to the selected elementsamount
- Maximum number of elements to modifypublic String addClass(String content, String selector, List<String> classNames)
content
- HTML content to modifyselector
- CSS selector for elements to add classes toclassNames
- Names of classes to add to the selected elementspublic String addClass(String content, String selector, String className)
content
- HTML content to modifyselector
- CSS selector for elements to add the class toclassName
- Name of class to add to the selected elementspublic String wrap(String content, String selector, String wrapHtml, int amount)
content
- HTML content to modifyselector
- CSS selector for elements to wrapwrapHtml
- HTML to use for wrapping the selected elementsamount
- Maximum number of elements to modifypublic String remove(String content, String selector)
content
- HTML content to modifyselector
- CSS selector for elements to removepublic String replace(String content, String selector, String replacement)
content
- HTML content to modifyselector
- CSS selector for elements to replacereplacement
- HTML replacement (must parse to a single element)public String replaceAll(String content, Map<String,String> replacements)
content
- HTML content to modifyreplacements
- Map of CSS selectors to their replacement HTML texts. CSS selectors find elements
to be replaced with the HTML in the mapping. The HTML must parse to a single
element.public List<String> text(String content, String selector)
content
- HTML content with the elementsselector
- CSS selector for elements to extract contentspublic String headingAnchorToId(String content)
<a name="myheading">
) names to
IDs for heading elements.
The anchors are used to indicate positions within a HTML page. In HTML5, however, the
name
attribute is no longer supported on <a>
) tag. The positions within pages
are indicated using id
attribute instead, e.g. <h1 id="myheading">
.
The method finds anchors inside, immediately before or after the heading tags and uses their
name as heading id
instead. The anchors themselves are removed.
content
- HTML content to modifypublic static List<String> concat(List<String> elements, String text, boolean append)
elements
- list of elements to append/prepend the text totext
- the given text to append/prependappend
- if true
, text will be appended to the elements. If false
, it will
be prependedpublic String ensureHeadingIds(String content, String idSeparator)
h1-6
) that
do not have one.
IDs on heading elements are used to indicate positions within a HTML page in HTML5. If a
heading tag without an id
is found, its "slug" is generated automatically based on
the heading contents and used as the ID.
Note that the algorithm also modifies existing IDs that have symbols not allowed in CSS selectors, e.g. ":", ".", etc. The symbols are removed.
content
- HTML content to modifyid
attributes. If all headings
were with IDs already, the original content is returned.public String fixTableHeads(String content)
<th>
(table heading) elements into <thead>
element if they are currently in <tbody>
.content
- HTML content to modifypublic static String slug(String input, String separator)
input
- text to generate the slug fromseparator
- separator for whitespace replacementpublic static String slug(String input)
input
- text to generate the slug frompublic List<? extends HtmlTool.IdElement> headingTree(String content)
<h2>
is nested under preceding <h1>
.
Only headings with IDs are included in the hierarchy. The result elements contain ID and heading text for each heading. The hierarchy is useful to generate a Table of Contents for a page.
content
- HTML content to extract heading hierarchy frompublic static org.jsoup.nodes.Element parseBodyFragment(String content)
content
- HTML content to parseCopyright © 2012–2014 Andrius Velykis. All rights reserved.