Transforming unstructured information given in texts into structured data allows one to automatically process such information using conventional programming techniques. One example of such information extraction technique is named entity recognition (NER). The objective of NER task is to find mention of a named entity and tag it correctly.
In this project, I've extracted named entities such as name, job, school, expertise, etc. from interview articles. Such information is useful if we wish to automatically collect information about a person.
These are some examples of NER results when interview articles are fed in.
Notice that while '아디다스 테렉스팀' is not a company (기업) per se, it kind of makes sense in this case.
The program correctly classifies '예술' as a company name.
I've used KoNLPy for Korean morpheme analysis together with MeCab backend. After segmenting and part-of-speech (POS) tagging the raw text, data is fed to Vowpal Wabbit (VW) to perform structured prediction. VW's implementation of search-based structured prediction worked nicely.
I thank RocketPunch for supporting annotated interview data.