Categories: Publishing

What it takes to execute a successful data conversion process: Best practices

[Approximate Reading Time : 4 mins]

How does one make high volume of legacy data, in hard copy or PDF, accessible to the modern digital media and that too quickly and sans errors? It’s all about data conversion done right. A good data conversion process can convert any form of Word document, PDF, hard copy to XML and HTML5 formats that are widely accessible across eBook readers and other digital and mobile devices.

Yet, a fully automated data conversion process is seldom the right way, for content needs the right context to give it the right direction. A semi-automated conversion project is the best bet—it weeds out the inconsistencies, accelerates turnaround, mitigates risks and errors, and yet retains the human context.

Here are a few best practices related to semi-automated data conversion for high-volume legacy materials.

  • Inconsistencies of the content

In general, source material and legacy material span for years and decades, and in the case of research or conference papers, it spans for even centuries, all stuck in their nonstructured format.

  • Undefined tagging

There are multiple ways to mark-up the same content, and you’ll need to decide which way is best. Document Type Definitions, or DTDs, which define the structure of documents in a general way, contain many tags that are optional, and some of the tags can be used in multiple structures, with room for interpretation.

  • Text extraction process

Primarily, the text from varied inputs will need to be extracted using various tools. Tools in common use include Adobe, Jade, and Gemini.

  • Conversion process

Conversion tools that support various generic DTDs can be used to handle the XML conversion, and further enhancements to XMLs are done by subject matter experts who can improve the structuring. The goal is always to deliver consistent results with high quality.

  • XML parsing

The converted XML is parsed against the DTD for its structuring and verified for proper display of text using relevant XSLT or available platform (if any).

Amnet

Share
Published by
Amnet

Recent Posts

Unlocking New Revenue Streams from Your Back Catalogs

What if I told you that within the depths of your publishing company's archives, awaits…

9 months ago

How Is AI Incorporated Into eLearning Solutions ?

[Approximate Reading Time : 4 mins] Every employee should have access to the highest quality…

1 year ago

Importance of Digital Accessibility in Education

[Approximate Reading Time : 4 mins] Students and faculty around the world generally found it…

2 years ago

New Trends in Educational Publishing

[Approximate Reading Time : 4 mins] The educational publishing industry has changed to be more…

2 years ago

Challenges in eBook Publishing and How to Overcome Them

[Approximate Reading Time : 4 mins] Digitization has taken over all aspects of our lives,…

2 years ago

Artificial Intelligence Enhances eLearning for Everyone

[Approximate Reading Time : 4 mins] Every employee should have access to the highest quality…

2 years ago