Journal of biomedical informatics

The most recent articles from: J Biomed Inform June 2016

J Biomed Inform · Jun 2016

PDF text classification to leverage information extraction from publication reports.

Data extraction from original study reports is a time-consuming, error-prone process in systematic review development. Information extraction (IE) systems have the potential to assist humans in the extraction task, however majority of IE systems were not designed to work on Portable Document Format (PDF) document, an important and common extraction source for systematic review. In a PDF document, narrative content is often mixed with publication metadata or semi-structured text, which add challenges to the underlining natural language processing algorithm. Our goal is to categorize PDF texts for strategic use by IE systems. ⋯ The rule-based multi-pass sieve framework can be used effectively in categorizing texts extracted from PDF documents. Text classification is an important prerequisite step to leverage information extraction from PDF documents.

explore further… or not…

What will the 'Medical Journal of You' look like?

Start your free 21 day trial now.