Supervised indonesian lexical database development and spelling correction for social media data mining

Search by :

ALL Author Subject ISBN/ISSN Advanced Search

Last search:

Supervised indonesian lexical database development and spelling correction for social media data mining

Widiputra, Harya Damar - Personal Name; Muliady, Wahyu - Personal Name;

Lexical Databases are commonly used in Natural Language Processing (NLP), more specifically in similarity measuring algorithms. There are two approach of similarity measuring algorithm, Corpus-based which use the Term Frequency (TF) algorithm and knowledge-based measures that use lexical databases. The most complete and used Lexical Database is WordNet, which is language dependent and currently not available for Indonesian. This research attempts to create Indonesian Lexical Database by developing an automatic Lexical Database generator framework. The corpus will have Indonesian lexicons whereby each lexicon has the following attributes: part of speech tag, synsets, word derivatives sets and antonym to synsets. This framework will combine web crawler technology, spelling correction algorithm and word clustering algorithm in creating the Lexical Database. The finalize Corpus will also contain slangs that need the human supervision for revision and defining the attributes. With the slangs included and faster time needed than to develop Indonesian WordNet, this Lexical Database can be an alternative for Social Media Data Mining.

Availability

B01185 (wh) Available

Detail Information

Series Title: -
Call Number: 1185
Publisher: : Swiss German University., 2012
Collation: -
Language: English
ISBN/ISSN: -
Classification: NONE
Content Type: -
Media Type: -
Carrier Type: -
Edition: -
Subject(s): IT
Specific Detail Info: -
Statement of Responsibility: -

Other version/related

No other version available

File Attachment

No Data

Comments

You must be logged in to post a comment