
What's covered?This topic sets out the results of some import tests that I conducted bringing Word documents into RH HTML and then explains the key points so that you can work out the best strategy for you. |
Prior to writing this, I had not been able to import a Word document into RoboHelp with clean HTML. The extent of the problems varied, in simple documents it was easy to clean things up but usually it was more involved and I stuck with my old method which remains valid.
Then someone sought my help with a RoboHelp problem and this required me to import a Word document. I noticed straight away that the HTML was clean so it proved that a clean import was possible, question was "how?" I needed to identify what was different about the document I had been sent.
I could see that the imported document was quickly knocked up and very simple. A hunch told me to create a brand new Word template with just the default styles or Headings 1 -3 and Normal. I then created three Word documents in different ways and imported them into RoboHelp with the results described below.
During the import process, RH creates its own style sheet for each Word document you import and applies it to all the topics created. You can of course apply your own style sheet afterwards but it is important that your style sheet has the necessary styles. We'll look at that later. For now just remember each import creates its own style sheet which is applied to all topics created from that import.
The first document was created using a brand new template and I restricted myself to just using the styles that were available, that is Headings 1 - 3 and Normal. I used the toolbar icons to create some bullet and numbered points and I included a table.
In the RH WYSIWYG editor it looked like this.
Not much.
You can see from the highlighting of the bullet point that it has imported as a proper bulleted paragraph rather than having spaces between the bullet and the text. In fact the only problem is that the table does not look quite right as all borders should be the same. However, it looks OK in preview and output so you could choose to leave it as is.
The HTML is completely clean and the document can be edited or used as the source for printed documentation without giving problems.
In reality though the documents you need to import are going to come from more complex templates than that one. For a start, pretty much all Word text books and websites advise that bulleted and numbered paragraphs should be created using Word styles.
So the next thing I did was to create the same content in a document based on my own Normal template. I then added some paragraphs using my bullet and numbering styles and an indented style. This is what it looked like.
Not much when you look at this but the display is masking the truth. The table has the same problem as Import 1 but note the bullet point highlighted, clearly it is not a true bulleted paragraph and has spaces between the bullet and the text. You might well react to that "so what, it will look OK to the user" and you are quite right. The problem is in the underlying HTML. It is no longer so clean and that is what potentially will give you grief further down the line.
If you are only going to produce online help, do not need to edit the content after import and have no need to generate printed documentation, then you can ignore the shortcomings and use the file that has been created. Otherwise read on.
Now let's go through one more import that I tried. In Word I created a copy of Import 2 and then changed the template attached from my Normal template to the one created for Import 1. It seemed to me to be a way that you could allow colleagues to create material for you using any template after which you could attach something more basic. Hopefully it would work as with Import 1. This is how that idea came out.
Again not much and it is better than Import 2. The table has the same problem as Import 1. Note however the bulleted points. The one created from the toolbar icon in Word has imported as a true bullet point while the one created from a Word style has not. Previously though, neither of the bullet points imported correctly.
The considerations are the same as with Import 2. You can probably live with this method if you only need online help, don't need to edit the import and don't need to create printed documentation. Otherwise read on.
During my tests with the RoboHelp 6 Import Wizard, I imported the documents prepared for the RoboHelp 6 tests. I was expecting the same results as above but did not get the proper bullet points. In other words, the tests were not consistent. Bullet points and numbering have been an issue in Word ever since Word 2 so I suspect there is some small difference between the documents used for the X5 tests and the RoboHelp 6 tests but I have not been able to pinpoint it. You will have to conduct your own tests with your documents as they are the ones that matter.
Those of you still reading are probably those who need to
In the previous section we established that a clean import is an unlikely scenario so how do we get the best results we can? By the way, don't think these problems are limited to importing into RoboHelp. Other tools will have their own problems. For ease of explanation I will look at the various aspects of the import and describe each of them. I will look at the issues and suggest how you can minimise the problems. It may be that you cannot follow the advice because of something unique to your source document, in which case you will have to import following my findings as closely as you can.
One thing I strongly recommend is do not import Word documents into your carefully crafted main project. Import them into a project set up just for the purposes of the import.
w If it goes well, then import the htm files created into your main project
w If you have problems then nothing is lost, just trash the temporary project and start again.
You have to regard the import as a trial and error experiment. If you start with that in mind, it helps the attitude later on! Make sure you keep clean backups to take further copies from if you need to try again. Don't use the backup itself.
If you are going to use your own pre-existing style sheet (css file), make sure the names of the styles match in the Word document and the CSS. You can change the names to make them match either in Word or during the import. I cover this aspect in more detail in the RoboHelp 6 topic and suggest you take a look at that as well.
If there are many such paragraphs and you have a hard copy of the document as a reference, you could use Word's Find and Replace tool to change those styles back to the Normal style. There's a similar macro you can download from my Word Macros that does the same job but a bit more easily. After converting the paragraphs back to Normal, you then have to painstakingly work through applying the bullet and number icon. If you don't do this in Word, the change will need to be made in RoboHelp. There will be no macro to help you and the Word document will have been split into numerous topics, making the task more time consuming. I realise that this could be a very time consuming task and you may only be able to justify it unless you have a strong case for creating topics that are easier to edit in RoboHelp. Also do a sample test first to make sure the method does work for you. If you have lots of bulleted or numbered lists, then you might want to consider upgrading to RoboHelp 6 as things are much better there.
If you are starting with more styles than you want in the RoboHelp project, it will probably be easier for you to streamline things in Word first. After the import, your Word document will most likely be split into a number of topics. You could change things in RH using but it's probably easier to do it in Word. Also for some of you, HTML will be a bit of a mystery and a place you do not want to be making too many changes.
Normal converts to a <p> tag in HTML, body text does not. Use Word's Find and Replace tool to convert all Body Text to Normal for better results.
What you should be aiming to have is a Word document that uses styles for each paragraph format that you want, rather than styles with variations being created manually. For example you should use Normal for most paragraphs and Normal Red for paragraphs that are to be formatted with red text, not selecting a paragraph and applying Red to the text. Unfortunately the latter is what most Word users will do but it is not the correct way.
During the import, RoboHelp 6 will create its own stylesheet for the document. For some authors that style sheet will be sufficient but if you are planning to change the topics to your own pre-existing style sheet after the import, then you should aim to match the styles. Normal in Word will map to <p> and heading styles will also map correctly. Other styles you need to test and it may be easier to modify a class in your style sheet to match the class name the RoboHelp has applied during the import.
Why?
Any manual formatting will convert to an inline style in your online
help. You should be aiming to have it convert to an html class as
that is controlled from your style sheet and makes mass changes much
easier.
How?
To find manually formatted paragraphs using Word 2003 make sure you
have this check box ticked in Tools | Options | Edit.
![]()
That's the one that gives you the much longer list of styles that you
see in Word 2003. At other times you might want to clear that one!
I place the cursor in each paragraph in turn and make sure the style
does not have a + symbol after it. For example it shows as
rather than the style with a + symbol after it.
![]()
The + sign indicates you have taken the style and modified it for that
paragraph. "Normal Red" would be OK as that indicates your
document contains a style with that name. The + symbol is what you
are looking for. You may find that you have to repeat the process several
times over in the Word document as Word will ignore what we may regard
as similar formatting because technically there is some difference.
The other way of doing this is to press Ctrl + A and then Ctrl + Spacebar
as that removes all manual formatting. You then get the document back
to the required appearance using styles rather than manual formatting.
Be careful with that one though as it may make more changes than you
really want. Check immediately and use the wonderful Undo feature if
necessary.
It's OK to have individual words manually formatted as the import routine
creates classes for those. That's useful as you can quickly change
the class so that something bold in Word can be blue and bold in RoboHelp.
How you do that varies a little according to your version of Word. In Word XP (2002) you go File | New to display a panel on the right and then select General Templates in the New from Template section. That displays the same dialogue box as is displayed when you go File | New in Word 2000. Change the radio button selection from Document to Template and just save the template with your chosen name. Do not make any changes to it. Your document may not look right but what we are interested in is what it looks like when it hits RoboHelp, not what it looked like in Word.
Open the document in Word and then go Tools | Templates and Add-ins. Click the Attach button and navigate to where you saved the clean template. In my initial test of the method, I did not select Automatically Update Document Styles and the method worked fine. With some subsequent tests I have found that it is necessary to select that option! On balance and until I have further evidence, select the Automatically Update Document Styles option. It will probably make your source document look odd but see my comments in the item above.
There's one vitally important step that I found. This "only" applies to those of you outside the US, although it has to be pointed out that there are more of us outside the US than in it! You need to select all the text in the Word document (Ctrl + A) and set the Language to English (US). Always leave this step until last before importing.
Two dialog boxes will be displayed during the import. In the first I got the best results with:
w "Preserve Heading Styles" not ticked so that all topics are created with Heading Style 1
w "Auto-create styles from inline formatting" not ticked (the defaults).
You may find otherwise, it's a bit of trial and error. For the other options you will need to make your own choices.
In the second dialog, make sure you tick the heading levels that you want RoboHelp to use when creating new topics.
At this point it is not possible to cover everything you might find. All I can do is point out the main things you are looking for.
|
Item |
Comment |
| General appearance | From a quick visual check in the WYSIWYG editor, your topics should look pretty much like they did in Word, tables excepted. |
|
Language |
In Step 8 above I recommended setting your language to English US.
|
|
Headings |
These import with no unwanted changes to your HTML. |
|
Paragraphs |
Paragraphs should show in the style dropdown as either Normal or with a class that represents the style applied in Word. You should find any paragraphs with the Normal style in Word come through as standard <p> tags in HTML. If you find instances of the <p> tag being written as <p class=InLineNormal>, either try attaching a basic template (as described above) or clearing the "Auto-create styles from inline formatting" check box that you will see during the import process. Other paragraphs will have a class applied and will look something like <p class=TheNameApplied> such as Indent, Bold, whatever name you used in Word for the style or changed it to during the import. |
|
Bullet Points |
They should have the gap shown in the Import 1 screenshot. If they do not, then look in the true code and you will probably find lots of spaces have been created, represented by in which case you need to look at how many went wrong and decide whether to correct things in RoboHelp or start again. You could limit your corrections to just those areas where you will need to edit the bulleted and numbered paragraphs. For the others, as long as it looks OK you can take a view on whether it needs a clean up. Your HTML for a bullet point should look something
like this. This is what it will look like if it did not
import correctly. The problem in RH with bullet points that do not import properly is when you try to add another bullet point. Try it and you'll see. You could cheat and copy and paste a paragraph and then change the text but that's only OK if you are tweaking the text. Any serious volume of changes and you will soon get fed up with that idea. Changing the bullet points in Word is probably quicker than cleaning up after the import. |
|
Numbered Paragraphs |
The same issues apply here as for bullet points. Note I have not tested with complex numbering methods. |
|
Custom Styles |
During the import RH will create classes and include this in the HTML. For example you may have a style in Word called Indent. Normally this will end up in the HTML as <p class=Indent> and in the style sheet RH creates, it will define this style so that it has the same appearance as in Word. It may be that after importing you are going to apply your own style sheet where the style is perhaps called Indented. Note that during the import RH tells you the style names it is going to apply and you can change them. In this case you would change it to Indented so that RH creates the HTML as <p class=Indented> and that will then work with your style sheet. |
|
Tables |
As shown above, tables look wrong in the WYSIWYG editor but they do display correctly in the output, at least they did in my tests. You can add rows and the result will be the same. Whether or not you can live with it is another matter. If necessary roll up your sleeves and set about changing the borders. See Useful Tools if you have lots of tables to edit. |
|
Images |
You may have noticed that so far I have not covered the subject of images. Generally speaking they should import OK but it's an area with many variables that I will leave for now! |
|
Bookmarks |
If after the import you see text with a grey background, the problem is the way the bookmarks were defined in Word. Instead of placing the cursor in front of some text and creating the bookmark, this problem occurs where the text has been selected and then the bookmark has been created. You will need to run the import again doing one of two things. Either clean up the bookmarks individually (try an interim import after you have changed some, just to make sure it is working) or you need to strip out the bookmarks and recreate them later. My macros include one for this purpose but do run it on a copy of the document. You can of course live with the grey background, it does no harm and simply means the topic does not look quite right in the RoboHelp editor. |
Try creating a printed output immediately after the import. Sometimes the HTML created can cause issues not related to styles. If you have problems see either Printed Output or Print Issues.
There are two main tools that you may find useful in this process. Macro Express and FAR. See Useful Tools and Links to find their websites.
At its simplest, you perform an action once recording it as a macro. Then you repeat it as many times as you want. It's much the same as recording a macro in Word except this works with any program you use. Last time I looked it was free for one month so you can easily find out if it is for you. There are other programs around and some of them free but in my opinion, this one is far and away the best. It can do all sorts of other useful things, like remember all your standard paragraph wordings.
FAR stands for Find and Replace which is in fact just one of the things it does. The beauty of this Find and Replace tool is that it finds strings across multiple lines (most do not and miss the string if it is not all on one line) and it works across multiple files. With careful use of this tool, you can quickly change all sorts of things that are not quite right. Bear in mind though that you can just as quickly wreck the whole project if you mess up, so take a copy of the project before you start. Again this tool is free to start with and it is my preferred tool
![]()
Date |
Changes to this page |
| 13 Apr 2007 | Major review of topic |
11 Jul 2005 |
Step 5 of "What's the best way of tackling the import?" amended to state DO select the automatically update option. |
10 Jul 2005 |
New topic |