Code and comments

Practical and theoretical aspects of software development

Learning Natural Language Processing from Stanford — early review

nlpStanford is offering a free course in Natural Language Processing online. You can find all of the details here. It’s not too late to sign up, so I thought I would give my early impression.

The course consists of recorded videos, corresponding sets of slides, small weekly problem sets, and substantial weekly programming assignments. Readings from two textbooks are suggested, but not required. Students are expected to have some programming experience, and some linear algebra and probability. Homework assignments must be completed in Java or Python.

This course is not for credit, but there will be a certificate for students that complete it successfully.

Video lectures

The lectures for the first week covered regular expressions, tokenization, stemming, and edit distance. The lectures were divided into eleven videos which ranged from three to fifteen minutes in length, for about 100 minutes of total lecture. The videos consisted of an effective combination of slides and screencasts, and most videos contained a few multiple choice questions to be answered in order to continue.


The combination of theory and practice seems appropriate to me. Profs Jurafsky and Manning seem very in touch with the practical realities of real-world text processing. For a particular example, there is open acknowledgement that for all the excitement about probabilistic methods of NLP, text processing often involves heavy usage of regular expressions. Moreover, the first programming assignment submerges the student in the practical realities of processing large amounts of real-world text with regexes.

Programming assignments

The course info mentions that programming experience is expected, and they mean it. I spent about four hours on the first assignment, and I didn’t try for perfection. But it could have taken much longer if I had never worked with regular expressions, or was unfamiliar with string formatting, etc. They have made every effort to eliminate unnecessary obstacle, such as choosing common languages, and providing starter code so that time is not wasting trying to learn file IO.


I can’t say much about the mathematical prerequisites at this point. Everything in the first week should be understandible to a general audience, and I can’t usefully predict how heavy the linear algebra and probability will be. It’s a pretty safe bet that conditional probabilities and matrix manipulations will be necessary, but I doubt that a deep understanding of the mathematics will be required … but it never hurts.


I’ve never been a big fan of the virtual classroom, but I’m excited about this opportunity. The material should be top notch, the convenience is unbeatable, and the course management software provided by coursera is extremely usable, even including a Stack Overflow – like Q&A forum for students and staff to use.

If you have an interest in NLP, then you should really consider whether this makes sense for you. If not, you should consider whether any of the other courses provided by Coursera are useful for you.


Written by Eric Wilson

March 9, 2012 at 9:14 pm

%d bloggers like this: