How does one make high volume of legacy data, in hard copy or PDF, accessible to the modern digital media and that too quickly and sans errors? It’s all about data conversion done right. A good data conversion process can convert any form of Word document, PDF, hard copy to XML and HTML5 formats that are widely accessible across eBook readers and other digital and mobile devices.
Yet, a fully automated data conversion process is seldom the right way, for content needs the right context to give it the right direction. A semi-automated conversion project is the best bet—it weeds out the inconsistencies, accelerates turnaround, mitigates risks and errors, and yet retains the human context.
Here are a few best practices related to semi-automated data conversion for high-volume legacy materials.
In general, source material and legacy material span for years and decades, and in the case of research or conference papers, it spans for even centuries, all stuck in their nonstructured format.
There are multiple ways to mark-up the same content, and you’ll need to decide which way is best. Document Type Definitions, or DTDs, which define the structure of documents in a general way, contain many tags that are optional, and some of the tags can be used in multiple structures, with room for interpretation.
Primarily, the text from varied inputs will need to be extracted using various tools. Tools in common use include Adobe, Jade, and Gemini.
Conversion tools that support various generic DTDs can be used to handle the XML conversion, and further enhancements to XMLs are done by subject matter experts who can improve the structuring. The goal is always to deliver consistent results with high quality.
The converted XML is parsed against the DTD for its structuring and verified for proper display of text using relevant XSLT or available platform (if any).
What if I told you that within the depths of your publishing company's archives, awaits…
[Approximate Reading Time : 4 mins] Every employee should have access to the highest quality…
[Approximate Reading Time : 4 mins] Students and faculty around the world generally found it…
[Approximate Reading Time : 4 mins] The educational publishing industry has changed to be more…
[Approximate Reading Time : 4 mins] Digitization has taken over all aspects of our lives,…
[Approximate Reading Time : 4 mins] Every employee should have access to the highest quality…