Effective Techniques for Indonesian Text Retrieval
A thesis submitted for the degree of
Doctor of Philosophy
Jelita Asian B.Comp. Sc.(Hons.),
School of Computer Science and Information Technology,
Science, Engineering, and Technology Portfolio,
RMIT University,
Melbourne, Victoria, Australia.
30th March, 2007Declaration
I certify that except where due acknowledgment has been made, the work is that of the
author alone; the work has not been submitted previously, in whole or in part, to qualify
for any other academic award; the content of the thesis is the result of work which has been
carried out since the official commencement date of the approved research program; and, any
editorial work, paid or unpaid, carried out by a third party is acknowledged.
Jelita Asian
School of Computer Science and Information Technology
RMIT University
30th March, 2007ii
Acknowledgments
First and foremost, I thank Justin Zobel, Saied Tahaghoghi, and Falk Scholer for their
patience and general academic and moral support during my candidature. Without their
guidance, the thesis would not exist. I thank Hugh Williams for supervising me for two years
before leaving RMIT.
I thank Halil Ali for crawling most of the document collections, and Bobby Nazief for
providing the s na source code and the dictionary used in this thesis; Vinsensius Berlian Vega
for his s v source code; Riky Irawan for the Kompas newswire documents; and Gunarso for
the Kamus Besar Bahasa Indonesia (KBBI) dictionary. I also thank Wahyu Wibowo for
his help in answering queries and Eric Dharmazi, Agnes Julianto, Iman Suyoto, Hendra
Yasuwito, Debby Andriani, Sinliana, Malian, Susanna Gunawan and Hanyu for their help
in creating our human stemming ground truth. I extend my gratitude to Beti Dimitrievska,
Chin Scott, and Cecily Walker for their assistance to research students at RMIT.
I also thank my parents and friends for their moral support. I thank many students
of the RMIT Search Engine Group: Steven Garcia, Pauline Chou, Ranjan Sinha, Michael
Cameron, Bodo Billerbeck, William Webber, Nick Lester, Sarvnaz Karimi, Dayang Iskandar,
Ying Zhao, Milad Shokouhi, Nikolas Astikis, Iman Suyoto, Jonathan Yu, Yaniv Bernstein,
Abdusalam Nwesri, Jovan Pechevski, Vaughan Shanks, Abhijit Chattaraj, Yanghong Xiang,
Pengfei Han, Yohannes Tsegay, and Rosette Kidwani. They have provided valuable assistance
during my research and have made my candidature experience interesting.
This research was conducted with the support of an International Postgraduate Research
Scholarship (IPRS) scholarship. Hardware used for experiments was provided with the support of the Australian Research Council and RMIT University VRII grant.
A thesis submitted for the degree of
Doctor of Philosophy
Jelita Asian B.Comp. Sc.(Hons.),
School of Computer Science and Information Technology,
Science, Engineering, and Technology Portfolio,
RMIT University,
Melbourne, Victoria, Australia.
30th March, 2007Declaration
I certify that except where due acknowledgment has been made, the work is that of the
author alone; the work has not been submitted previously, in whole or in part, to qualify
for any other academic award; the content of the thesis is the result of work which has been
carried out since the official commencement date of the approved research program; and, any
editorial work, paid or unpaid, carried out by a third party is acknowledged.
Jelita Asian
School of Computer Science and Information Technology
RMIT University
30th March, 2007ii
Acknowledgments
First and foremost, I thank Justin Zobel, Saied Tahaghoghi, and Falk Scholer for their
patience and general academic and moral support during my candidature. Without their
guidance, the thesis would not exist. I thank Hugh Williams for supervising me for two years
before leaving RMIT.
I thank Halil Ali for crawling most of the document collections, and Bobby Nazief for
providing the s na source code and the dictionary used in this thesis; Vinsensius Berlian Vega
for his s v source code; Riky Irawan for the Kompas newswire documents; and Gunarso for
the Kamus Besar Bahasa Indonesia (KBBI) dictionary. I also thank Wahyu Wibowo for
his help in answering queries and Eric Dharmazi, Agnes Julianto, Iman Suyoto, Hendra
Yasuwito, Debby Andriani, Sinliana, Malian, Susanna Gunawan and Hanyu for their help
in creating our human stemming ground truth. I extend my gratitude to Beti Dimitrievska,
Chin Scott, and Cecily Walker for their assistance to research students at RMIT.
I also thank my parents and friends for their moral support. I thank many students
of the RMIT Search Engine Group: Steven Garcia, Pauline Chou, Ranjan Sinha, Michael
Cameron, Bodo Billerbeck, William Webber, Nick Lester, Sarvnaz Karimi, Dayang Iskandar,
Ying Zhao, Milad Shokouhi, Nikolas Astikis, Iman Suyoto, Jonathan Yu, Yaniv Bernstein,
Abdusalam Nwesri, Jovan Pechevski, Vaughan Shanks, Abhijit Chattaraj, Yanghong Xiang,
Pengfei Han, Yohannes Tsegay, and Rosette Kidwani. They have provided valuable assistance
during my research and have made my candidature experience interesting.
This research was conducted with the support of an International Postgraduate Research
Scholarship (IPRS) scholarship. Hardware used for experiments was provided with the support of the Australian Research Council and RMIT University VRII grant.