Binary Patent Classification Methods for Few Annotated Samples

Benjamin Meindl, Ingrid Ott, U.T. Zierahn

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademic

Abstract

In this paper, we develop binary patent classification algorithms for ambiguous concepts and small sample sizes. These are particularly useful for economic questions, which often require binary classification for implementing ambiguous and subjective concepts, where human classification is time-consuming, so that sample sizes are small. This covers examples such as whether workers are susceptible to automation or not, or whether a device is an automat or not. We compare the performance of naive Bayes, support vector machine, random forest and k-nearest neighbor classifiers with a the spaCy convolutional neural network (CNN) model, as well as spaCy CNN model pre-trained with patent data. The results show overall highest accuracy for the CNN models, with a significantly improved performance through pre-training. Our analysis suggests that the spaCy pre-trained CNN model provides a highly accurate NLP model, feasible for implementation without extensive computation capacity required. Pre-training was particularly beneficial for small sample sizes. Already 100 labeled patents lead to an accuracy of 77.2%. The low sample size required, may encourage researchers in various fields to use manually labeled patent data, for evaluating their specific question.
Original languageEnglish
Title of host publicationProceedings of The 1st Workshop on Patent Text Mining and Semantic Technologies (PatentSemTech 2019)
EditorsL. Andersson, H. Aras, F. Piroi, A. Hanbury
Place of PublicationKarlsruhe, Germany
Pages13-17
DOIs
Publication statusPublished - 2019
Externally publishedYes
Event1st Workshop on Patent Text Mining and Semantic Technologies - Karlsruhe, Germany
Duration: 12 Sept 2019 → …

Publication series

NameProceedings of The 1st Workshop on Patent Text Mining and Semantic Technologies

Conference

Conference1st Workshop on Patent Text Mining and Semantic Technologies
Abbreviated titlePatentSemTech 2019
Country/TerritoryGermany
CityKarlsruhe
Period12/09/19 → …

Fingerprint

Dive into the research topics of 'Binary Patent Classification Methods for Few Annotated Samples'. Together they form a unique fingerprint.

Cite this