Named Entity Recognition on Interview Articles 인터뷰 기사로부터 개체명 인식하기

Transforming unstructured information given in texts into structured data allows one to automatically process such information using conventional programming techniques. One example of such information extraction technique is named entity recognition (NER). The objective of NER task is to find mention of a named entity and tag it correctly.

In this project, I've extracted named entities such as name, job, school, expertise, etc. from interview articles. Such information is useful if we wish to automatically collect information about a person.

Examples

These are some examples of NER results when interview articles are fed in.

극한의 도전이 짜릿한 쾌감을 줍니다

전문분야: 익스트림 스포츠선수
이름: 방창석
전문분야: 베이스점프 스페셜리스트
전문분야: 익스트림 스포츠
대회: 특전사령관배 패러글라이딩대회
기업: 아디다스 테렉스팀

Notice that while '아디다스 테렉스팀' is not a company (기업) per se, it kind of makes sense in this case.

하재봉이 만난 사람 | KBO 사무총장 된 최고의 해설가 하일성

직책: 사무총장
이름: 하일성
기업: 한국프로야구위원회
기업: KBS
직책: 해설위원

변호사, 법학교수 지내다 우리 술에 빠지다

이름: 정회철
기업: 예술
직책: 대표
전문분야: 전통주
전공: 법학전문대학원
직책: 교수
기업: 우리술협동조합
직책: 이사장

The program correctly classifies '예술' as a company name.

Methods

I've used KoNLPy for Korean morpheme analysis together with MeCab backend. After segmenting and part-of-speech (POS) tagging the raw text, data is fed to Vowpal Wabbit (VW) to perform structured prediction. VW's implementation of search-based structured prediction worked nicely.

Acknowledgement

I thank RocketPunch for supporting annotated interview data.