Paper 68

Generated: Tue Feb 28 17:21:17 2006

68 - Title A new aspect of sentence boundary detection method for Turkish
Authors Ozlem Aktas, Dokuz eylul university computer engineering department
PC Member No
Contact person Ozlem Aktas, ozlem_at_cs.deu.edu.tr, 2323736040
Main Fields
Other Main Fields Doğal Dil İşleme
Abstract + Keywords “Natural Language Processing” (NLP) is a research area that is used for many different purposes and it becomes more popular continuously. Speech syntheses, speech recognition, machine translation, spelling correction are some of the application of NLP. For determining a language’s morphological specialties, it is needed to generate a corpus that represents the language and make some statistical and morphological analysis on it. The first step of generating such corpus is sentence boundary detection. This process is very complicated and hard to solve, but since it is the most important part of the generating corpus. In this work, new method is developed to solve sentence boundary problem. The abbreviation list and rules generated for the sentence boundary detection are stored in an XML file; these files had provided successive results in sentence boundary detection. This new method will help researchers by separating sentences correctly and efficiently, about means of time and other costs.

Keywords: Natural language processing, Turkish corpus, morphological analysis, sentence boundary detection.
Remarks

prev (67) overview next (69)

68 - Title	A new aspect of sentence boundary detection method for Turkish
Authors	Ozlem Aktas, Dokuz eylul university computer engineering department
PC Member	No
Contact person	Ozlem Aktas, ozlem_at_cs.deu.edu.tr, 2323736040
Main Fields
Other Main Fields	Doğal Dil İşleme
Abstract + Keywords	“Natural Language Processing” (NLP) is a research area that is used for many different purposes and it becomes more popular continuously. Speech syntheses, speech recognition, machine translation, spelling correction are some of the application of NLP. For determining a language’s morphological specialties, it is needed to generate a corpus that represents the language and make some statistical and morphological analysis on it. The first step of generating such corpus is sentence boundary detection. This process is very complicated and hard to solve, but since it is the most important part of the generating corpus. In this work, new method is developed to solve sentence boundary problem. The abbreviation list and rules generated for the sentence boundary detection are stored in an XML file; these files had provided successive results in sentence boundary detection. This new method will help researchers by separating sentences correctly and efficiently, about means of time and other costs. Keywords: Natural language processing, Turkish corpus, morphological analysis, sentence boundary detection.
Remarks