
What's covered?This topic covers how to import a Word document into RoboHelp HTML using the import wizard in RoboHelp 2002, although much will apply to other versions. There is a separate topic covering the X5 Wizard. Setting the Conversion Options Post-producing the Import in RoboHelp HTML The topic was originally produced by M J Plaster and was available from her site www.wwwinnovations.com (no longer online). It was titled "RoboHelp HTML's Smart Microsoft Word Import Exposed". I have been unable to trace M J Plaster and hope that she will not mind my reproducing her article here. I have not amended the content in any way except for some minor formatting changes. Any reference to "me" "my" "we" etc. refers to M J Plaster and / or her website. If anyone is in contact with M J Plaster, would they please either advise me how I can contact her or let her know of this topic so that she can confirm it is OK to continue posting it. |
Introduced in Version 2002, RoboHelp HTML allows you to directly import Microsoft Word documents without first processing them through RoboHelp Classic. The Word import function in Classic is far more forgiving, and the full power of Word is available to you in Classic; but if you follow the rules outlined here when preparing Word documents for import, you'll be richly rewarded with a slight reduction of adjustments in the post-production phase. This paper provides a number of strategies and workaround when using this import function.
It is important to note that because print and online media are inherently different, with their components often serving different purposes in each medium, no tool should be expected to import and convert perfectly. We created source documents in Word 2000. Your mileage may vary with other Word versions.
How you prepare the structure of a Word document based (on?) your need to repurpose the content is beyond the scope of this paper. It goes without saying that you need to keep the final output and structure in mind when structuring the Word document.
When RoboHelp HTML imports a Word document, it splits the document into individual topics based on the user's selection of Word heading styles, converts cross-references into links, and converts Word hyperlinks into URLs. It also converts inserted graphics into HTML-appropriate graphics. Depending on the method of insertion in the Word document, RoboHelp HTML either displays the image in the WYSIWYG editor or inserts a placeholder.
The Word table of contents and index entries convert into their online equivalents; but RoboHelp HTML leaves a few remnants behind, which require clean up in the post-production phase. During the conversion, RoboHelp HTML creates a style sheet based on style attributes in the Word document. We suggest that you attach one or more existing or new style sheets to the topics after conversion.
RoboHelp HTML supports two types of Word import:
For either method above, importing Word documents into RoboHelp HTML is a three-step process:
The majority of the time required to import a Word document into RoboHelp HTML is spent preparing and post-producing the material.
This test was performed using two sets of Word documents. We tested the import feature on new and existing projects. We performed the import and conversion on each set of Word documents multiple times to confirm the results.
It is unlikely that you'll ever have to perform all of the tasks outlined in this section. We assume that some of the Word documents you convert will originate with content providers outside of your group, perhaps from individuals not as well versed in Word as the typical technical writer. In fact, we are assuming the worst-case scenario in terms of a source document as we write these procedures, covering as many contingencies as possible.
Before preparing the Word document, make a copy of it, especially if you plan to use the source document for another purpose. The object of preparing the Word document is to facilitate a predictable conversion and minimize post-production. By using Word's powerful search and replace features, you can automate tasks in Word that you would have to perform manually in RoboHelp HTML. Techniques presented here are based on "best practices," as defined by the author's extensive trial and error. Your mileage may vary. The only rule is this: What you don't clean up in Word, you will have to clean up in RoboHelp HTML. Let your proficiency in Word and RoboHelp HTML guide you to your shortest path between point A and point B.
RoboHelp HTML creates a style sheet based on the Word styles in the source document, so it is best to ensure the style names and definitions in the Word document match existing RoboHelp HTML style names and definitions as closely as possible. For best results, do not skip any heading levels. For example, do not follow a Heading 1 section with a Heading 3 section.
If the Word document contains direct formatting (the document looks beautiful, but Normal style is applied to everything), you must apply styles throughout the document. If you regularly receive documents from content providers containing direct formatting, consider creating a template for them. Spending a few minutes training content providers in correct style application will save you hours of work when you receive the documents. It will also save them hours of useless direct formatting.
The RoboHelp HTML import feature does not include mapping functionality enabling you to map a style in the Word document to an existing or default style in RoboHelp HTML. By changing the Word styles to match the RoboHelp HTML styles, you are manually mapping styles. If you replace existing styles with RoboHelp HTML styles, the majority of your imported document's styles will conform to HTML standard elements and to the default RoboHelp HTML styles, and the less you will have to clean up in RoboHelp HTML.
RoboHelp HTML seemingly uses styles Heading 1 - Heading 6. In reality, those "styles" are written as HTML elements or tags <H1> - <H6>. Replace existing heading styles with Heading 1 through Heading 6 to conform to the RoboHelp HTML styles and standard HTML elements.
Change all body styles in the Word document that are identical to one body style. It doesn't matter what it's called, with one exception. Don't call it "Normal." In RoboHelp HTML, the Normal style converts to the <P> HTML element. Unfortunately, if you use Normal style in the Word document, RoboHelp HTML converts it to a class, and RoboHelp HTML actually displays two Normal styles in the Styles list: one Normal style that is the <P> element, and another that is the class.
There is no way to force a style to convert to <P>.
If you are an HTML purist, you can change all paragraphs in Word formatted in Normal to another style, for example, Body Text. Then you can apply Normal style to all Body Text paragraphs in RoboHelp HTML after converting the document.
If you're not an HTML purist, don't worry about this. If you are an HTML purist, you'll want to be sure to visit the Comparison of HTML Produced by HATS site. If you're a real HTML purist, you're probably not reading this paper, and you certainly aren't using RoboHelp HTML to create HTML-based online information systems.
If any of your heading styles include a border, remove the border from the heading in Word. If a heading includes a border, RoboHelp HTML places the heading in a positioned text box, and the heading displays in green in the WYSIWYG editor, while displaying in black with the border in preview mode. The only way to deal with this after the import is to move the text out of the positioned text box and remove the text box, and then redefine the style to include a border (way too much trouble). If you want a heading style to include a border, define the style after the conversion in RoboHelp HTML. This creates the least amount of work and code bloat.
|
Heading style with border in WYSIWYG editor |
|
Heading style with border code bloat |
RoboHelp HTML automatically creates a style sheet based on the Word styles in the imported document and attaches the style sheet to all the imported topics. You can handle this in one of two ways. If you already have a suitable style sheet, import the style sheet into the new project and attach all imported topics to the existing style sheet, or attach a style sheet in an existing project. If you do not have a style sheet, it may be easier to create one from scratch rather than using the one extracted from the Word styles.
A margin note resides in the left cell of a table in Word, and RoboHelp HTML treats the margin note as part of a table during the import. That might work for a graphic, but it might not work for an annotation. If you do not want margin notes to appear as tables, you can either create separate sections in the Word document, or you can move the content around after importing into Word.
|
Margin note imports as the left cell in a table |
The easiest way to handle footnotes and endnotes is to create unique styles for each, and move them from their location in the document to their own self-contained section. If you leave them as "print" components, RoboHelp HTML doesn't know what to do with them during the conversion (remember that these are all print concepts), and results are not suitable for online presentation. Be sure that each footnote and endnote has a title so that you can easily identify them in RoboHelp HTML. You'll probably want to create links to them for online presentation. If you do not create topics or sections for each endnote and footnote, RoboHelp HTML adds them to the last topic it creates during conversion.
You can position tables in Word, or you can wait to precisely position them in RoboHelp HTML. It's the same amount of work either way.
Assuming that you want all your tables aligned left in RoboHelp HTML, you can write a macro to search Word for tables and then position them with a left margin of 0 inches.
There is simply nothing you can do to prepare lists in Word. No matter what you do, you will need to recreate all lists in RoboHelp HTML.
You can place graphics into a Word document in one of two ways. You can either insert them, or you can link them. Inserted graphics reside in the Word document. Linked graphics reside outside of Word and are referenced in the Word location as a field code. Linking graphics reduces the size of the Word document, which is the primary argument for linking them in large documents.
RoboHelp HTML converts inserted graphics to JPGs. Graphics scaled in Word come in scaled, and they look almost as good as they do in Word. As a best practice (unless you are a graphics fanatic or prefer PNGs), you'll probably want to scale the graphics in Word, and let RoboHelp HTML offload them into proper HTML graphics.
Now for the bad news: RoboHelp HTML does not handle linked graphics well. When you convert a linked graphic, RoboHelp HTML displays a rhetorical message during the conversion.
|
Linked graphic message |
Instead of the image, RoboHelp HTML inserts a placeholder in the WYSIWYG editor.
|
Image placeholder as it appears in the WYSIWYG editor |
You can only view the graphic when you preview the topic or when you generate a target. To view the graphic in the WYSIWYG editor, remove the placeholder, and reinsert the graphic. To avoid this issue, break the link in the Word document by selecting the linked graphic, and pressing CTRL+SHIFT+F9.
Embedded objects and callouts added to graphics using Word's Draw feature are equally problematic. Graphics with callouts convert to multiple graphics, and positioning is inconsistent, at best. If you've embellished graphics with any of the options available on Word's Draw toolbar, you might consider taking a screen capture of the embellished graphic, saving it as a graphic, and inserting the new, simplified graphic into the Word document. In short, simplify these objects in Word before importing.
RoboHelp HTML converts all Word URL hyperlinks without any further author intervention. RoboHelp HTML does not write any additional font formatting to converted hyperlinks. The link styles in the cascading style sheet apply.
Remove the title page and any front matter that you do not want to include in the online system. Do not remove the TOC section.
Remove all section breaks and manual page breaks from the original document, except the break between the table of contents and the main body section of the document. RoboHelp HTML converts manual page breaks and section breaks into line breaks and adds a visual indicator at the location of each break. You can automate this in Word by performing a search for the break, replacing with nothing (just leave the Replace field empty).
|
Imported (line) break in the WYSIWYG editor |
RoboHelp HTML writes some additional code for the section break. This code serves no purpose except to add bloat to the HTML file. We'll call it "section break code bloat" to differentiate it from other code bloat.
|
Section break code bloat |
Update the Word table of contents in order to include all topics in the RoboHelp HTML TOC. To update the table of contents, place your cursor anywhere in the TOC field, and press F9.
If you do not update the table of contents in Word, RoboHelp HTML will not build a TOC or will build a TOC containing only topics that appear in the Word table of contents
RoboHelp HTML converts single-level index entries, multi-level index entries, and it converts "See Also" index entries into index cross-references. Remove the Word Index section and the section break immediately before the index before importing the source document. Do not remove Word XE fields. They contain the information RoboHelp HTML needs to create keywords and assign them to topics. Assuming that you want to create links from the online index to the topic, move all Word XE fields to the beginning of the topic. RoboHelp HTML creates a bookmark at the location of the XE field and links to the bookmark, not to the topic.
|
Word XE fields |
If you leave the index in the document, RoboHelp HTML creates a redundant "Index" topic as well as converting the XE fields to keywords.
|
Redundant Index topic |
RoboHelp HTML only partially converts Word page range index entries. If the imported Word document contains page range entries spanning several topics, RoboHelp HTML creates the keyword and associates it with the first topic, not with the subsequent topics comprising the page range. Adjust the index in RoboHelp HTML as necessary.
To import one or more Word documents into a new project, select the Import tab in the RoboHelp Starter. Select Word Document (*.doc) in the starter to open the Import Project - Select Project Format dialog box. In that dialog box, select a target. Browse for the file(s) to open, and click Open.
RoboHelp HTML opens the Import Word Document - Conversion Options dialog box. Set options as described in Setting the Conversion Options.
To import one or more Word documents into an existing document, select File I Import I Word document from the RoboHelp HTML menu. In the Import File dialog box, browse for the file(s) to open, and click Open. Set options as described in Setting the Conversion Options.
As a best practice, create a new folder to house all your imported topics. Right-click the new folder, and select Import I Word document from there instead of from the File menu. By segregating all imported topics, you can quickly locate all imported topics for review after the conversion. Move topics in the Project Manager to their desired location after review and cleanup.
When you import a Word document into an existing or a new project, set options in the Import Word Document - Conversion Options dialog box. Select which headings to convert to new topics, and set other conversion options in this dialog box.
|
Conversion Options dialog box |
RoboHelp HTML attempts to guess which headings you want to convert to new topics. Usually it selects only Heading 1 and Heading 2. Custom headings (headings with names other than Heading #) do not appear in the Create new topics based on list. To create topics from custom headings, click Customize to open the Customize Import dialog box.
|
Customize Import dialog box |
Select the styles to convert to new topics, and click OK to return to the Conversion Options dialog box.
Select other conversion options as outlined in the next section, and click OK to begin the conversion.
This adds a significant amount of time to the conversion and requires that you baby-sit the conversion.
If you make a mistake it's easier to tweak after-the-fact or to run a new conversion if you follow the rule about importing topics into a new folder. The worst-case scenario is that you might set the wrong conversion options. If you do, simply delete all the topics in the new folder, and start over.
This option label is deceiving. Always select this option to maintain your style hierarchy. If you do not select this option, RoboHelp HTML converts all selected headings to Heading 1. This option has no effect on the actual style definition (font, size, color, etc.), only the actual style applied to the topic heading.
Select to convert Word crossreferences to hyperlinks.
Select to create a TOC based on the source document's current TOC field.
Select to convert Word index XE fields to keywords.
The amount of time you spend cleaning up the conversion is directly proportional to the condition of the Word source document. Even if the Word document was squeaky clean, you must visually inspect each imported topic. Isolating converted topics is easy if you import them into a new folder.
Reviewing converted topics is easier if you do it in "passes," and it is best to perform these passes in the order listed below.
|
Pass |
Action |
|
TOC pass |
Find books linked to placeholder topics, and link the books to other topics or modify the books' properties to not link to a topic. Make any other necessary adjustments to the TOC. Heading levels often do not convert to the proper level in the TOC. This has been logged by eHelp technical support as bug number 4552. For more information, see Correcting the TOC. To the best of our knowledge, this bug has not been fixed. |
|
First pass through topics |
Delete any topics that serve merely as placeholders-topics with a title and no content-unless you plan to add content. Do not perform this pass until you have completed the TOC pass. |
|
Second pass |
Attach topics to templates or style sheets and apply styles as necessary. |
|
Third pass |
Recreate every list in the project. |
|
Fourth pass |
Clean up and reformat tables. |
|
Link pass |
Create links as necessary, including links to footnotes, endnotes, and margin notes. |
|
Index pass |
Recreate links from index ranges in the Word document to all but the first topic in the page range. Add additional keywords as necessary. Converted index entries point to their respective bookmarks in the topics. In the case of long topics, you need to remove the bookmark and link the index entry to the topic instead if you did not relocate the Word XE fields to the beginning of the topic. |
|
Glossary pass |
Optionally copy and paste glossary terms into the Glossary Designer, and then delete glossary topics. |
|
Final pass |
Quality assurance pass. |
The very first thing you need to do is redefine any placeholder topics that created books if you want to remove the placeholder topic.
|
Overview book links to empty Overview placeholder topic |
In the TOC Designer, right-click any book or topic, and select Properties from the shortcut menu to view its properties.
|
Overview book properties |
When the Book with Link check box is selected, the book links to a topic. If the placeholder topic doesn't contain any information, clear the Book with Link check box, and then you can safely delete the placeholder topic from the project. If you delete all placeholder topics without changing this option, you will delete your entire TOC and have to build it from scratch.
RoboHelp HTML doesn't always convert the Word table of contents correctly. This is a hit-or-miss proposition, and there is no way to guarantee correct results. If you have numerous headings of any level, RoboHelp HTML often interprets them incorrectly when converting.
|
Incorrect heading levels |
In the previous example, "Chapter 2 Footnotes" is a Heading 2 in the source document. Every TOC entry beneath Chapter 2 Footnotes is a Heading 3 in the source document. The trend seems to indicate that the more Heading 3s in the source document, the more indented the books and topics in the TOC. When experimenting, we imported the same document over fifty times.
Sometimes, the above example converted perfectly. Other times, it converted as illustrated above, and still other times, it converted entirely differently. One colleague reported that the books went nearly ten levels deeps in some instances. Move books and topics as necessary after conversion. The same colleague reported that in the case of complex Word documents, it is easier to build the TOC from scratch.
If your project contains existing style sheets, attach imported topics to the appropriate one. If this is a new project, it is far better to create a new style sheet from scratch than to use the one that RoboHelp HTML converted during the import. The imported style sheet contains an endless array of default Word styles.
If your Word document contains a table of contents, RoboHelp HTML creates a redundant Contents topic during the conversion.
|
Redundant Contents topic |
When you use a heading as a placeholder, with no intervening text before the next heading, RoboHelp HTML creates a placeholder topic during the conversion.
|
Redundant placeholder topic |
You might be tempted to remove these empty topics in the source document. Don't delete them in Word. RoboHelp HTML uses these topics to create the TOC structure. These placeholder topics appear as books with topics under them, and the book is linked to the empty topic. When you delete a book from the TOC, you RoboHelp HTML delete everything under the book. If you delete the topic in the Project Manager, RoboHelp HTML deletes the book and everything below it in the TOC.
Word uses tabs to create the hanging indent for lists. Unfortunately, HTML doesn't use tabs. HTML uses <UL> and <OL> tags to display lists. RoboHelp HTML does not convert Word lists into proper HTML lists. Instead, it converts the tab to a series of non-breaking spaces.
|
Converted list in WYSIWYG editor |
|
Non-breaking spaces in the TrueCode editor |
To convert lists to proper HTML lists, remove each number or bullet and its trailing spaces. Select the entire list, apply Normal style, and then format your list from scratch. If you touch-type, it's sometimes faster to keyboard a whole new list than it is to painfully remove numbers and spaces.
To position a table, right-click in the table, and select Table Properties from the shortcut menu to open the Table Properties dialog box. On the Table tab, change the value in the Indent from left text box. Make any other necessary adjustments in the Table Properties dialog box, and click OK.
|
Change Indent from left value to adjust positioning |
Carefully review tables, especially those formatted using Word's Table AutoFormat feature. RoboHelp HTML honors most table attributes, but you will usually have to adjust at least one table or cell border in RoboHelp HTML.
RoboHelp HTML creates one bookmark for each Word XE field in the Word document. Since RoboHelp HTML converts XE fields to keywords, do not remove the XE fields from Word.
RoboHelp HTML displays a green flag as a visual indicator for each bookmark.
|
Index bookmarks in the WYSIWYG editor |
|
Code for three bookmarks in the TrueCode editor |
You can easily determine which topics contain XE bookmarks in the Project Manager. The prefix is IX_.
|
Bookmarks in Project Manager |
Any topic with a El beside it contains bookmarks. To view the bookmarks click 111 to expand the topic and display the bookmarks. Double-click a bookmark to open the topic to the location of the bookmark.
You might be tempted to remove the bookmarks. Don't delete them. RoboHelp HTML actually links the keywords to bookmarks in the topic, not to the topic. If you delete the bookmark, you will break the link. If you remove the bookmark, be sure to re-create an association between the keyword and the topic in the Index Designer.
In our opinion, the term "smart" describes RoboHelp HTML's new Word import function as accurately as the phrase "one-click single source" described the single-source function some years ago when that phrase was coined. Perhaps our expectations were a bit high, and perhaps we were biased by the reasonably good results importing into RoboHelp Classic and then single sourcing to an HTML-based output. After working with RoboHelp HTML's Smart Word Import feature, common sense led us to believe that there has to be an easier way. It is our opinion, based on our experimentation, that this legacy tool needs some serious work in this area.
We invite you to report your results with RoboHelp HTML Version 2002's Smart Microsoft Word Import feature. Since your mileage may vary significantly, and we only experimented on two different sets of Word documents, we will report any contradictory results that we can reproduce.
Check back shortly for white papers on importing Word documents with the new-generation tools: Doc-To-Help 6.0, WordHelp, and Author-IT. We'll also update this site to include RoboHelp HTML X3 results after posting the papers for the other tools.
Note: Many thanks to Char James-Tanny who shared her test results, and corroborated many of my results on RoboHelp HTML's Smart Word Import feature.
END OF M J PLASTER'S TOPIC
To print this topic without splitting graphics:
![]()
Date |
Changes to this page |
| 09 Aug 2004 | New topic. |