Natural Language Processing for the Working Programmer

Daniël de Kok

Harm Brouwer

License

Some rights reserved. This book is made available under the Creative Commons Attribution 3.0 License (CC-BY). This license is available from: http://creativecommons.org/licenses/by/3.0/


Table of Contents

Preface
1. Acknowledgements
1. Introduction
1.1. Welcome
1.2. What is natural language processing?
1.3. What is Haskell?
1.4. What you need
1.5. Ready, set, go!
2. Words
2.1. Introduction
2.2. Playing with words
2.3. From words to sentences
2.4. A note on tokenization
2.5. Word lists
2.6. Storing functions in a file
2.7. Word frequency lists
2.8. Monads
2.9. Reading a text corpus
3. N-grams
3.1. Introduction
3.2. Bigrams
3.3. A few words on Pattern Matching
3.4. Collocations
3.5. From bigrams to n-grams
3.6. Lazy and strict evaluation
3.7. Suffix arrays
3.8. Markov models
4. Distance and similarity (proposed)
5. Classification
5.1. Introduction
5.2. Naive Bayes classification
5.3. Maximum entropy classification
6. Information retrieval (proposed)
7. Part of speech tagging
7.1. Introduction
7.2. Frequency-based tagging
7.3. Evaluation
7.4. Transformation-based tagging
Bibliography
8. Regular languages (proposed)
9. Context-free grammars (Proposed)
10. Performance and efficiency (proposed)
A. Contributors
A.1. Donations

List of Figures

3.1. Constructing a suffix array
3.2. Linear search step
3.3. Binary search step
5.1. Linear and non-linear classifiers
5.2. Two competing models

List of Tables

7.1. Performance of the frequency tagger

List of Equations

2.1. Type-token ratio
3.1. Difference between observed and expected chance
3.2. Pointwise mutual information
3.5. Estimating the probability of a sentence
3.6. The probability of a sentence as a Markov chain
3.8. Approximation using the Markov assumption
3.9. The conditional probability of a word using the Markov assumption
3.10. The probability of a sentence using a bigram model
5.1. Calculating the empirical value of a feature
5.2. Calculating the expected value of a feature
5.3. Constraining the expected value to the empirical value
7.1. Transformation rule selection criterion