Stemming Indonesian - Introduction

1 Introduction
Stemming is a core natural language processing
technique for e cient and e ective Information Retrieval (Frakes 1992), and one that is widely accepted
by users. It is used to transform word variants to their
common root word by applying | in most cases |
morphological rules. For example, in text search, it
should permit a user searching using the query term
\stemming" to nd documents that contain the terms
\stemmer" and \stems" because all share the common root word \stem". It also has applications in
machine translation (Bakar & Rahman 2003), document summarisation (Oras an, Pekar & Hasler 2004),
and text classi cation (Gaustad & Bouma 2002).
For the English language, stemming is wellunderstood, with techniques such as those of
Lovin (1968) and Porter (1980) in widespread use.
However, stemming for other languages is less wellknown: while there are several approaches available
for languages such as French (Savoy 1993), Spanish (Xu &… Stemming Indonesian - Introduction

Stemming Indonesian - Abstract

Stemming Indonesian
Jelita Asian Hugh E. Williams S.M.M. Tahaghoghi
School of Computer Science and Information Technology
RMIT University, GPO Box 2476V, Melbourne 3001, Australia.
fjelita,hugh,saiedg@cs.rmit.edu.au

Abstract
Stemming words to (usually) remove su xes has applications in text search, machine translation, document summarisation, and text classi cation. For example, English stemming reduces the words \computer", \computing", \computation", and \computability" to their common morphological root,
\comput-". In text search, this permits a search for
\computers" to nd documents containing all words
with the stem \comput-". In the Indonesian language, stemming is of crucial importance: words
have pre xes, su xes, in xes, and con xes that make
matching related words di cult. In this paper, we
investigate the performance of ve Indonesian stemming algorithms through a user study. Our results
show that, with the availability of a reasonable dic… Stemming Indonesian - Abstract

Algoritma Stemming Bahasa Indonesia Perl

Algoritma Stemmer Bahasa Indonesia


Program pencari kata dasar (stemmer) dalam Bahasa Indonesia, dibuat dengan bahasa pemrograman Perl. Program ini bekerja menggunakan kamus kata dasar, menurut pola kata berimbuhan sesuai pedoman Ejaan Yang Disempurnakan (EYD). Semoga bermanfaat.
1. Pendahuluan Struktur pembentukan kata dalam Bahasa Indonesia adalah sebagai berikut: [awalan-1] + [awalan-2] + dasar + [akhiran] + [kepunyaan] + [sandang] Masing-masing bagian tersebut (yang dalam kotak bisa ada atau tidak), digabungkan dengan kata dasar membentuk kata berimbuhan. Di bawah ini imbuhan yang banyak digunakan dalam Bahasa Indonesia : Kata sandang: -lah, -kah, -pun, -tah.Kata kepunyaan: -ku, -mu, -nya.Akhiran: -i, -an, -kan.Awalan: me-, ber-, pe-, di-, ke-, ter-, se-. Dalam proses penggabungan awalan, terdapat aturan-aturan berikut: Awalan Perubahan Bentuk Aturan me | pe meng | peng + V | k | g | h | q … Misal: mengambil = meng + ambil V = Vokal (a, i, u, e, o)
meny | peny + s … Misal: penyakit = … Algoritma Stemming Bahasa Indonesia Perl