Sungjoo Ha

Named Entity Recognition on Interview Articles 인터뷰 기사로부터 개체명 인식하기

Transforming unstructured information given in texts into structured data allows one to automatically process such information using conventional programming techniques. One example of such information extraction technique is named entity recognition (NER). The objective of NER task is to find mention of a named entity and tag it correctly.

In this project, I've extracted named entities such as name, job, school, expertise, etc. from interview articles. Such information is useful if we wish to automatically collect information about a person.

Examples

These are some examples of NER results when interview articles are fed in.

극한의 도전이 짜릿한 쾌감을 줍니다

Notice that while '아디다스 테렉스팀' is not a company (기업) per se, it kind of makes sense in this case.

하재봉이 만난 사람 | KBO 사무총장 된 최고의 해설가 하일성

변호사, 법학교수 지내다 우리 술에 빠지다

The program correctly classifies '예술' as a company name.

Methods

I've used KoNLPy for Korean morpheme analysis together with MeCab backend. After segmenting and part-of-speech (POS) tagging the raw text, data is fed to Vowpal Wabbit (VW) to perform structured prediction. VW's implementation of search-based structured prediction worked nicely.

Acknowledgement

I thank RocketPunch for supporting annotated interview data.