The Knowledge Management Suite for SharePoint 2010 offers rule-based content classification with support for several different rule-based engines. Currently implemented and available out-of-the-box is a classification rule engine, that is based on logical expressions of terms described here. Please note: If another engine is used, other rules could be operated (see vendor documentation for this).
Clasification Rules
Classification rules are managed in the Term Store Manager extension (see above). A rule describes a term and forms a logical expression, that gives true or false when applied to a SharePoint item or document. In case of true, the item or document is classified with the tag / term, that is related to the rule. Please note, that in case a rule is defined, the term name and synonyms are not used anymore. Please use the auto-generate rule feature to create a rule from existing term names and synonyms on the fly as a starting point. The auto-generated rule is the rule, that is used in case no rule is specified (empty field). In this case the term name and the synonyms are joined by the OR operator.
Generally a rule is broken up into terms and operators and can use grouping.
Basic Terms
There are two types of terms: Single Terms and Phrases. A Single Term is a single word such as "Marketing" or "Sales". Terms are not case sensitive. A Phrase is a group of words surrounded by double quotes such as "Email Marketing". Multiple terms can be combined together with Boolean operators to form a more complex rule(see below).
Advanced Terms
Terms can be simple valid REGEX expressions without any parentheses. REGEX expressions have to be surrounded by double quotes. See here for more information about REGEX.
Boolean Operators
Boolean operators allow terms to be combined through logic operators. We support AND, OR, and NOT as Boolean operators (Note: Boolean operators must be ALL CAPS).
OR
The OR operator links two terms and finds a matching document if either of the terms exist in a document. This is equivalent to a union using sets. To tag documents that contain either "Sales" or just "Email Marketing" use the rule: "Email Marketing" OR sales.
AND
The AND operator matches documents where both terms exist anywhere in the text. This is equivalent to an intersection using sets. To tag documents that contain "Email Marketing" and "Direct Marketing" use the rule: "Email Marketing" AND "Direct Marketing"
NOT
The NOT operator excludes documents that contain the term after NOT. This is equivalent to a difference using sets. To tag documents that contain "Email Marketing" but not "Direct Marketing" use the rule: "email marketing" AND NOT "direct marketing". Please note, that the NOT operator cannot be used without other operators.
Grouping
We support using parentheses to group rules to form sub rules. This can be very useful if you want to control the boolean logic for a rule. To search for either "Email Marketing" or "Direct Marketing" "Sales" use the rule: ("Email Marketing" OR "Direct Marketing") AND "Sales". This eliminates any confusion and makes sure, that "Sales" must exist and either term "Email Marketing" or "Direct Marketing" may exist.
Please note that nested parentheses are not allowed in the current version.
Samples of valid rules
("Email Marketing" OR "Direct Marketing") AND "Sales"
("Email Marketing" OR "Direct Marketing") AND Sales
("Email Marketing" OR "Direct Marketing") NOT Sales
("Email Marketing" OR "Direct Marketing") AND (NOT Sales)
("Email Marketing" OR "Direct Marketing") AND NOT Sales
(Email) AND (Marketing)
Email and Marketing
"email marketing" NOT "direct marketing"
(".+marketing" OR Direct Marketing)
Samples of invalid rules
"email marketing" NOT direct marketing
- Nested parentheses error:
("Email Marketing" OR ("Direct Marketing" AND Sales))
- Operators not all caps error:
("Email Marketing" OR ("Direct Marketing" and Sales))
- NOT is used without any other operator
NOT "Email Marketing"
- REGEX is used without double quotes
(.+marketing OR Direct Marketing)