The simplicity and popularity of collaborative tagging as an information organization approach comes at the expense of several limitations.
Manual content categorization / taggingTraditionally, content has been categorized by subject experts, who manually reviewed documents and matched them to categories within a taxonomy. Despite the costs involved in manual categorization, it is perceived to have one key advantage: 100% accuracy. This is not necessarily true:
- Firstly, people choose tags based on their personal opinions, their knowledge background and their preferences. Subject experts may not have the bigger picture. An expert categorizing documents in his field does not necessarily possess expertise in other subjects - other parts of the taxonomy. An article about a businessman purchasing a baseball team may be reviewed by a sports expert, and categorized in the "Basketball" category, but not in one of the more specific subcategories of the "Mergers and Acquisitions" category.
- Furthermore, users may be describing the same object based on different granularity. This creates a noisy tag space and thus makes it harder to find material tagged by other users.
- Secondly, people may use polysemous words (a word that has many related senses) in order to tag the web resources. The lack of semantic distinction in tags can lead to inappropriate connections between items.
- Another problem is that different tags, which are either synonymous or have closely related meaning increase data redundancy, leading to reduced recall of information.
- Last, but not least, people tend to assign a very small number of tags to an object.
In addition, manual categorization it completely impractical for very large repositories of data that grow at a fast pace - exactly the case for most modern organizations.
All these limitations have led researchers to develop methods that assist users in the tagging process, by automatically suggesting an appropriate rich set of tags, in order to avoid the aforementioned obstacles.
Automatic rule-based content categorizing / taggingUsing this approach, information experts attempt to define the discriminating properties of categories using a set of rules. These rules may be simple (e.g. "does the word 'snow' appear in the document"), or use more complex operators (e.g. "does the word 'snow' appear together with the word 'skate'"). In order to find precise rules that distinguish similar categories (for example, "Financial Planning" and "Investment Banking") one needs substancial expertise in the subject being covered. This approach's reliance on human-comprehensible rules is an advantage, because it allows an organization to leverage existing knowledge and expertise. It needs time, effort and expertise to create the rules - but the results are absolutely predictable and can be improved step by step.