Importing Using the X5 Wizard

What's covered?

This topic sets out the results of some import tests that I conducted bringing Word documents into RH HTML and then explains the key points so that you can work out the best strategy for you.

Introduction

Prior to writing this, I had not been able to import a Word document into RoboHelp with clean HTML. The extent of the problems varied, in simple documents it was easy to clean things up but usually it was more involved and I stuck with my old method which remains valid.

Then someone sought my help with a RoboHelp problem and this required me to import a Word document. I noticed straight away that the HTML was clean so it proved that a clean import was possible, question was "how?" I needed to identify what was different about the document I had been sent.

I could see that the imported document was quickly knocked up and very simple. A hunch told me to create a brand new Word template with just the default styles or Headings 1 -3 and Normal. I then created three Word documents in different ways and imported them into RoboHelp with the results described below.

The Tests

During the import process, RH creates its own style sheet for each Word document you import and applies it to all the topics created. You can of course apply your own style sheet afterwards but it is important that your style sheet has the necessary styles. We'll look at that later. For now just remember each import creates its own style sheet which is applied to all topics created from that import.

Import 1

The first document was created using a brand new template and I restricted myself to just using the styles that were available, that is Headings 1 - 3 and Normal. I used the toolbar icons to create some bullet and numbered points and I included a table.

In the RH WYSIWYG editor it looked like this.

What's wrong?

Not much.

You can see from the highlighting of the bullet point that it has imported as a proper bulleted paragraph rather than having spaces between the bullet and the text. In fact the only problem is that the table does not look quite right as all borders should be the same. However, it looks OK in preview and output so you could choose to leave it as is.

The HTML is completely clean and the document can be edited or used as the source for printed documentation without giving problems.

Import 2

In reality though the documents you need to import are going to come from more complex templates than that one. For a start, pretty much all Word text books and websites advise that bulleted and numbered paragraphs should be created using Word styles.

So the next thing I did was to create the same content in a document based on my own Normal template. I then added some paragraphs using my bullet and numbering styles and an indented style. This is what it looked like.

What's wrong?

Not much when you look at this but the display is masking the truth. The table has the same problem as Import 1 but note the bullet point highlighted, clearly it is not a true bulleted paragraph and has spaces between the bullet and the text. You might well react to that "so what, it will look OK to the user" and you are quite right. The problem is in the underlying HTML. It is no longer so clean and that is what potentially will give you grief further down the line.

If you are only going to produce online help, do not need to edit the content after import and have no need to generate printed documentation, then you can ignore the shortcomings and use the file that has been created. Otherwise read on.

Import 3

Now let's go through one more import that I tried. In Word I created a copy of Import 2 and then changed the template attached from my Normal template to the one created for Import 1. It seemed to me to be a way that you could allow colleagues to create material for you using any template after which you could attach something more basic. Hopefully it would work as with Import 1. This is how that idea came out.

What's wrong?

Again not much and it is better than Import 2. The table has the same problem as Import 1. Note however the bulleted points. The one created from the toolbar icon in Word has imported as a true bullet point while the one created from a Word style has not. Previously though, neither of the bullet points imported correctly.

The considerations are the same as with Import 2. You can probably live with this method if you only need online help, don't need to edit the import and don't need to create printed documentation. Otherwise read on.

During my tests with the RoboHelp 6 Import Wizard, I imported the documents prepared for the RoboHelp 6 tests. I was expecting the same results as above but did not get the proper bullet points. In other words, the tests were not consistent. Bullet points and numbering have been an issue in Word ever since Word 2 so I suspect there is some small difference between the documents used for the X5 tests and the RoboHelp 6 tests but I have not been able to pinpoint it. You will have to conduct your own tests with your documents as they are the ones that matter.

Getting the Best Import

Those of you still reading are probably those who need to

In the previous section we established that a clean import is an unlikely scenario so how do we get the best results we can? By the way, don't think these problems are limited to importing into RoboHelp. Other tools will have their own problems. For ease of explanation I will look at the various aspects of the import and describe each of them. I will look at the issues and suggest how you can minimise the problems. It may be that you cannot follow the advice because of something unique to your source document, in which case you will have to import following my findings as closely as you can.

One thing I strongly recommend is do not import Word documents into your carefully crafted main project. Import them into a project set up just for the purposes of the import.

w If it goes well, then import the htm files created into your main project

w If you have problems then nothing is lost, just trash the temporary project and start again.

Preparation

You have to regard the import as a trial and error experiment. If you start with that in mind, it helps the attitude later on! Make sure you keep clean backups to take further copies from if you need to try again. Don't use the backup itself.

  1. Look at what styles are in use in your Word document.

If you are going to use your own pre-existing style sheet (css file), make sure the names of the styles match in the Word document and the CSS. You can change the names to make them match either in Word or during the import. I cover this aspect in more detail in the RoboHelp 6 topic and suggest you take a look at that as well.

  1. Change all paragraphs that use either bullet point or numbered styles so that the bullets or numbers are applied using Word's toolbar icons.

If there are many such paragraphs and you have a hard copy of the document as a reference, you could use Word's Find and Replace tool to change those styles back to the Normal style. There's a similar macro you can download from my Word Macros that does the same job but a bit more easily. After converting the paragraphs back to Normal, you then have to painstakingly work through applying the bullet and number icon. If you don't do this in Word, the change will need to be made in RoboHelp. There will be no macro to help you and the Word document will have been split into numerous topics, making the task more time consuming. I realise that this could be a very time consuming task and you may only be able to justify it unless you have a strong case for creating topics that are easier to edit in RoboHelp. Also do a sample test first to make sure the method does work for you. If you have lots of bulleted or numbered lists, then you might want to consider upgrading to RoboHelp 6 as things are much better there.

  1. Consider reducing the number of styles used before you start the import.

If you are starting with more styles than you want in the RoboHelp project, it will probably be easier for you to streamline things in Word first. After the import, your Word document will most likely be split into a number of topics. You could change things in RH using but it's probably easier to do it in Word. Also for some of you, HTML will be a bit of a mystery and a place you do not want to be making too many changes.

  1. Convert any Body Text to Normal.

Normal converts to a <p> tag in HTML, body text does not. Use Word's Find and Replace tool to convert all Body Text to Normal for better results.

  1. Look at manual formatting that has been applied at paragraph level as opposed to character level.

What you should be aiming to have is a Word document that uses styles for each paragraph format that you want, rather than styles with variations being created manually. For example you should use Normal for most paragraphs and Normal Red for paragraphs that are to be formatted with red text, not selecting a paragraph and applying Red to the text. Unfortunately the latter is what most Word users will do but it is not the correct way.

During the import, RoboHelp 6 will create its own stylesheet for the document. For some authors that style sheet will be sufficient but if you are planning to change the topics to your own pre-existing style sheet after the import, then you should aim to match the styles. Normal in Word will map to <p> and heading styles will also map correctly. Other styles you need to test and it may be easier to modify a class in your style sheet to match the class name the RoboHelp has applied during the import.

Why?
Any manual formatting will convert to an inline style in your online help. You should be aiming to have it convert to an html class as that is controlled from your style sheet and makes mass changes much easier.
How?
To find manually formatted paragraphs using Word 2003 make sure you have this check box ticked in Tools | Options | Edit.

That's the one that gives you the much longer list of styles that you see in Word 2003. At other times you might want to clear that one! I place the cursor in each paragraph in turn and make sure the style does not have a + symbol after it. For example it shows as
 
rather than the style with a + symbol after it.

The + sign indicates you have taken the style and modified it for that paragraph. "Normal Red" would be OK as that indicates your document contains a style with that name. The + symbol is what you are looking for. You may find that you have to repeat the process several times over in the Word document as Word will ignore what we may regard as similar formatting because technically there is some difference.
The other way of doing this is to press Ctrl + A and then Ctrl + Spacebar as that removes all manual formatting. You then get the document back to the required appearance using styles rather than manual formatting. Be careful with that one though as it may make more changes than you really want. Check immediately and use the wonderful Undo feature if necessary.
It's OK to have individual words manually formatted as the import routine creates classes for those. That's useful as you can quickly change the class so that something bold in Word can be blue and bold in RoboHelp.

  1. Create a template that you will use only for this purpose. You might want to skip Steps 4 and 5 and see how what results you get. Provided you are importing into a new project, you can always carry out these steps later.

How you do that varies a little according to your version of Word. In Word XP (2002) you go File | New to display a panel on the right and then select General Templates in the New from Template section. That displays the same dialogue box as is displayed when you go File | New in Word 2000. Change the radio button selection from Document to Template and just save the template with your chosen name. Do not make any changes to it. Your document may not look right but what we are interested in is what it looks like when it hits RoboHelp, not what it looked like in Word.

  1. Apply the template to the document to be imported.

Open the document in Word and then go Tools | Templates and Add-ins. Click the Attach button and navigate to where you saved the clean template. In my initial test of the method, I did not select Automatically Update Document Styles and the method worked fine. With some subsequent tests I have found that it is necessary to select that option! On balance and until I have further evidence, select the Automatically Update Document Styles option. It will probably make your source document look odd but see my comments in the item above.

  1. One of the most important things I found was changing the language setting in Word.

There's one vitally important step that I found. This "only" applies to those of you outside the US, although it has to be pointed out that there are more of us outside the US than in it! You need to select all the text in the Word document (Ctrl + A) and set the Language to English (US). Always leave this step until last before importing.

  1. Open the new project and start the import.

Two dialog boxes will be displayed during the import. In the first I got the best results with:

w "Preserve Heading Styles" not ticked so that all topics are created with Heading Style 1

w "Auto-create styles from inline formatting" not ticked (the defaults).

You may find otherwise, it's a bit of trial and error. For the other options you will need to make your own choices.

In the second dialog, make sure you tick the heading levels that you want RoboHelp to use when creating new topics.

Reviewing the Import

At this point it is not possible to cover everything you might find. All I can do is point out the main things you are looking for.

Item

Comment

General appearance From a quick visual check in the WYSIWYG editor, your topics should look pretty much like they did in Word, tables excepted.

Language

In Step 8 above I recommended setting your language to English US.

  • If you do that, during the import the body tag will simply be amended to read <body lang=EN-US> which you can amend to show as <body> or whatever language you want.
  • If you do not do that you will will get the same body tag <body lang=EN-US> but every paragraph will include <span lang=EN-GB> or  whatever language you have set. You do not want that. If you need to set the language, do so setting the correct language in the body tag.

Headings

These import with no unwanted changes to your HTML.

Paragraphs

Paragraphs should show in the style dropdown as either Normal or with a class that represents the style applied in Word.

You should find any paragraphs with the Normal style in Word come through as standard <p> tags in HTML. If you find instances of the <p> tag being written as <p class=InLineNormal>, either try attaching a basic template (as described above) or clearing the "Auto-create styles from inline formatting" check box that you will see during the import process.

Other paragraphs will have a class applied and will look something like <p class=TheNameApplied> such as Indent, Bold, whatever name you used in Word for the style or changed it to during the import.

Bullet Points

  • Bullet points created with Word's bullet icon import correctly using the methods described in Import 1 and Import 3.
  • Bullet points created using Word styles (which annoyingly is the method all Word gurus recommend) do not import properly.

They should have the gap shown in the Import 1 screenshot. If they do not, then look in the true code and you will probably find lots of spaces have been created, represented by &nbsp; in which case you need to look at how many went wrong and decide whether to correct things in RoboHelp or start again.

You could limit your corrections to just those areas where you will need to edit the bulleted and numbered paragraphs. For the others, as long as it looks OK you can take a view on whether it needs a clean up.

Your HTML for a bullet point should look something like this.

<ul style="list-style: disc; margin-top: 0cm;"
type=disc>
<li class=kadov-p><p>A bullet point created using the toolbar icon</p></li>
</ul>

This is what it will look like if it did not import correctly.

<p class=BulletRound>&#9679;<span style="font: 7.0pt 'Times New Roman';">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
</span>A bullet point created using a Word style.</p>

The problem in RH with bullet points that do not import properly is when you try to add another bullet point. Try it and you'll see. You could cheat and copy and paste a paragraph and then change the text but that's only OK if you are tweaking the text. Any serious volume of changes and you will soon get fed up with that idea.

Changing the bullet points in Word is probably quicker than cleaning up after the import.

Numbered Paragraphs

The same issues apply here as for bullet points. Note I have not tested with complex numbering methods.

Custom Styles

During the import RH will create classes and include this in the HTML. For example you may have a style in Word called Indent. Normally this will end up in the HTML as <p class=Indent> and in the style sheet RH creates, it will define this style so that it has the same appearance as in Word.

It may be that after importing you are going to apply your own style sheet where the style is perhaps called Indented. Note that during the import RH tells you the style names it is going to apply and you can change them. In this case you would change it to Indented so that RH creates the HTML as <p class=Indented> and that will then work with your style sheet.

Tables

As shown above, tables look wrong in the WYSIWYG editor but they do display correctly in the output, at least they did in my tests. You can add rows and the result will be the same. Whether or not you can live with it is another matter.

If necessary roll up your sleeves and set about changing the borders. See Useful Tools if you have lots of tables to edit.

Images

You may have noticed that so far I have not covered the subject of images. Generally speaking they should import OK but it's an area with many variables that I will leave for now!

Bookmarks

If after the import you see text with a grey background, the problem is the way the bookmarks were defined in Word. Instead of placing the cursor in front of some text and creating the bookmark, this problem occurs where the text has been selected and then the bookmark has been created.

You will need to run the import again doing one of two things. Either clean up the bookmarks individually (try an interim import after you have changed some, just to make sure it is working) or you need to strip out the bookmarks and recreate them later.

My macros include one for this purpose but do run it on a copy of the document.

You can of course live with the grey background, it does no harm and simply means the topic does not look quite right in the RoboHelp editor.

 

Is the import OK?

Printing

Try creating a printed output immediately after the import. Sometimes the HTML created can cause issues not related to styles. If you have problems see either Printed Output or Print Issues.

Useful Tools

There are two main tools that you may find useful in this process. Macro Express and FAR. See Useful Tools and Links to find their websites.

Macro Express

At its simplest, you perform an action once recording it as a macro. Then you repeat it as many times as you want. It's much the same as recording a macro in Word except this works with any program you use. Last time I looked it was free for one month so you can easily find out if it is for you. There are other programs around and some of them free but in my opinion, this one is far and away the best. It can do all sorts of other useful things, like remember all your standard paragraph wordings.

FAR

FAR stands for Find and Replace which is in fact just one of the things it does. The beauty of this Find and Replace tool is that it finds strings across multiple lines (most do not and miss the string if it is not all on one line) and it works across multiple files. With careful use of this tool, you can quickly change all sorts of things that are not quite right. Bear in mind though that you can just as quickly wreck the whole project if you mess up, so take a copy of the project before you start. Again this tool is free to start with and it is my preferred tool

 

Date

Changes to this page

13 Apr 2007 Major review of topic

11 Jul 2005

Step 5 of "What's the best way of tackling the import?" amended to state DO select the automatically update option.

10 Jul 2005

New topic