Index Sorting: Fixing Term Order in Japanese
Introduction
There is a bug in MadCap Flare causing index terms in Japanese to be improperly sorted in printed outputs.
A workaround is to create a supporting file in Flare known as Index Link file or edit an existing one. This guideline describes the process to conveniently create an Index Link file focusing on Japanese terms, although the concepts discussed are applicable to other languages, whenever the resulting indexes need to be tweaked.
Jump start
Those familiar with the concept and particulars of Index Link files in Flare willing to cut to the chase and get started straightaway, should read the Disclaimer section and then fast-forward to the Japanese index terms workaround section.
Overview
The general idea behind this workaround is to include entries in the said Index Link file for the missorted entries using the last correctly sorted term as a seed or anchor followed by one or more incremental characters to alter the resulting position in the index.
In a regular alphabetized list, the string ZZZA would be placed after ZZZ, ZZZB would follow ZZZA, and so on.
That is, by suffixing a word or term with an incremental character, it is possible to control its position in a list when alphabetized. This behavior can be exploited to correctly sort index entries by adding known control characters (A, B, C…) to the seed or anchor term.
However, modifying the actual terms is unacceptable. Flare’s solution to this situation is to use a behind-the-scene or ancillary entry list comprised of alternative terms. These alternative terms can be discretionally modified to fit our needs.
The discrete entries mentioned above are known as “Sort as” entries. Flare uses them in lieu of the actual terms for sorting purposes only. They are not rendered in the resulting output, allowing us to be creative with these alternate strings.
Initial considerations
As in any software development process, the “better safe than sorry” axiom is the recommend principle to go by.
The following guidelines have been tested and work just fine in most cases but caution is paramount to safeguard the production Flare project by making backups or, if using a source control system, following the recommended practices.
The techniques described in this document involve editing files outside Flare’s safe environment. This approach is potentially hazardous so it is highly recommended to initially implement tweaks in a controlled environment, as described in section Testing the solution below.
Example
Let’s say we need to sort the words (or terms) One, Two, and Three, as if they were numbers: 1, 2, 3. In a normal alphabetization, the word Three would be placed before Two, resulting in One, Three, Two, which is not the desired result.
To tackle this situation in Flare, we use an alternate “Sort as” string associated with the term we need to reposition in the index. Since we want Two to be placed after One, we tell Flare “The term Two has to be sorted as OneA”, whereas One is the seed term and A the incremental control character.
The whole terms and strings involved are summarized in the table below:
Such information is passed on to Flare by means of an Index Link file. For the above example, the underlying entries in the Index Link file would look like this:
An Index Link file has a .FLINX extension and can be implemented within Flare1 while the section Implementing a large number of terms below describes an optional process to bulk add a large number of entries to the Index Link file. This FLIXL file is generally stored in the \Project\Advanced folder in the Flare project directory.
Japanese index terms workaround
As in the above example, to modify the default Flare sorting for Japanese index entries requires implementing a FLINX Index Link file or editing an existing one.
For example, the table below shows a list of missorted terms in the first column, then the correctly sorted terms next to it and, finally, what the “Sort As” field should look like to fix the sorting issue:
Terms in the two top rows are correctly sorted so they don’t need to be addressed in the Index Link file. Besides, the term in the second row is the last correctly sorted entry and thus the one used as seed or anchor term. The third column of the third row shows the “Sort as” entry comprised of the seed entry plus an incremental control character.
Fixing sub-term indexing
In the case of sub-terms (terms grouped under a top-level term), the approach to re-arrange them is a bit different since the process requires prefixing the incremental character to each sub-term entry within the term they are related to. Terms are also known as first-level keywords and sub-terms as second-level keywords.
The table below is an example of a top level term with four missorted sub-terms. The Flare syntax for this subordinate relationship is <term>:<sub-term> as shown in the third column. Sorting these sub-terms correctly requires prefixing them with an incremental control character, which is placed between the separating colon and the sub-term.
The table below shows an example of missorted entries, their correct sorting, and how the “Sort as" should look like:
The above sub-terms would look like this in the FLIXL file:
Implementing a large number of terms
While a few “Sort as” entries can be conveniently added via Flare’s Index pane, handling a large number of missorted entries can be cumbersome and error-prone, especially if carried out by a non-native speaker. In such cases, it is more effective to edit the Index Link file using Flare’s built-in text editor or any third-party text editor, like Notepad++2 or Microsoft Visual Studio Code3.
To further expedite the process, it is advisable to use a spreadsheet, such as Microsoft Excel or Open Office4, to quickly generate the required Index Link file entries.
The general plan is to arrange the necessary data in columns: XML code, Terms, and Sort as strings, as in the following table:
Then, in another column, concatenate the value of the cells in each row. Supposing the first column is column A and the first row is row 2 in an MS Excel spreadsheet, the concatenation formula would look like this:
=A2&B2&C2&D2&E2&F2
and so on for the remaining rows, as shown in the below screenshot.
The results in the concatenated column (G in the above screenshot), would look like this:
Select all the resulting cells in the concatenated column and copy them to the Clipboard, then paste them in the applicable FLIXL file which, for the above example, would look like this:
Testing the solution
As in any software engineering task, caution is paramount and the same holds true in localization engineering. Therefore, it is extremely important to test our tweaks to make sure nothing is broken and, especially, they work as intended.
However, building the whole output just for testing purposes each time the FLIXL file is updated, only to find out it needs to be edited again, is counter-productive. A more practical approach to test our changes is to create a sort of “Mini-Me” of the actual production index. This reduced version can be implemented within the actual Flare project or as a new one.
Either way, this is the required bill of materials:
- A topic HTML file
- A FLTOC Table of Content file including the above topic
- A FLTAR Target file based on the above TOC
- The FLIXL Index Link file to be tested
The topic file
This test topic file should include all the terms found in the production project. Similarly to the technique described in Implementing a large number of terms to update an FLIXL file, we can avail of a spreadsheet to quickly generate our topic file boasting all the terms found in the full project.
Once we have listed all the terms in the index output, we copy them to a column alongside the required HTML code in columns as well.
Here is how our spreadsheet would look like:
The cell values in each row are concatenated in another column. In this case, the formula to concatenate all four columns would be like this:
=A2&B2&C2&B2&D2
whereas column B cell is used twice, once as a term and once as the <p> tag inner text, which should give us the following results:
As in the below screenshot:
In Flare, we create a new topic file and paste the resulting cells by copying the concatenated cells in column E. Each cell will become a paragraph in our newly created topic:
The Table of Content and Target files
Once saved, the topic file is added to a new Table of Content FLTOC file. In turn, the newly created FLTOC is used as Primary TOC in a new PDF FLTAR Target file, ready to build our testing output.
The image below shows the dependency between these three files:
To build a PDF output with this downsized content takes significantly less than its full sized parent project. Fine-tuning our edited FLIXL file this way will be a lot more convenient and faster.
The section Index fixing within the localization workflow further discusses scenarios where the Index Link file might require updates during the localization process. Having a testing procedure in place will proof invaluable for these scenarios.
Once the desired index sorting is achieved in our mockup test output, the changes made in the Index Link file should also work for the parent project index.
Index fixing within the localization workflow
One potential caveat for this indexing workaround guide is if terms are modified during the Flare project localization process. There are different reasons for a term to be modified, but four of the most common ones are:
- Mistranslation
- Inconsistency
- Incorrect term tag syntax
- Updates or versioning
Although the details of each of the above categories fall beyond the scope of this guide, it is relevant to note their nature and the localization process stages where each potential issue might be flagged and fixed.
On the other hand, a potential bogus index sorting won’t be assessed until the resulting output is reviewed or submitted for Quality Assurance (QA). At this point, it is advisable to ask the linguist to confirm whether the index sorting in the submitted output is correct.
Once the linguistic review and QA are completed, we can address any issues with the sorting order. Given terms are subject to changes, it is crucial to monitor the terms in our Flare project throughout the rest of the localization process. If term changes occur whenever the output is submitted for review (In-Country Review, Subject Matter Expert review, further Quality Assurance stages), we must revisit our index fixing process and, most importantly, before the final delivery.
Downloads
A zip file including the files used for the examples in this guideline is available for download
The above zip file includes:
- The scaled down Flare project IndexJA
- The Index Link and term topic builder IndexJA.xlsx MS Excel spreadsheet
Disclaimer
The information and guidelines included in this document are offered for general information purposes and, although every effort has been made to ensure they serve the purpose, given the variety of potential code designs the results cannot be guaranteed under all condition and the author assumes no liability or responsibility of any kind over any erroneous, bogus, or unexpected results. It remains the user's responsibility to properly apply and adapt these guidelines to each situation.
References
1 – MadCap Flare Creating Index Links: https://help.madcapsoftware.com/flare2024/Content/Flare/Indexes/Main-Activities/Creating-Index-Links.htm
2 – Notepad++: https://notepad-plus-plus.org/
3 – Microsoft Visual Studio Code: https://code.visualstudio.com/
4 – Open Office: https://www.openoffice.org/