Index Sorting: Fixing Term Order in Japanese

Introduction

There is a bug in MadCap Flare causing index terms in Japanese to be improperly sorted in printed outputs.

A workaround is to create a supporting file in Flare known as Index Link file or edit an existing one. This guideline describes the process to conveniently create an Index Link file focusing on Japanese terms, although the concepts discussed are applicable to other languages, whenever the resulting indexes need to be tweaked.

Jump start

Those familiar with the concept and particulars of Index Link files in Flare willing to cut to the chase and get started straightaway, should read the Disclaimer section and then fast-forward to the Japanese index terms workaround section.

Overview

The general idea behind this workaround is to include entries in the said Index Link file for the missorted entries using the last correctly sorted term as a seed or anchor followed by one or more incremental characters to alter the resulting position in the index.

In a regular alphabetized list, the string ZZZA would be placed after ZZZ, ZZZB would follow ZZZA, and so on.

That is, by suffixing a word or term with an incremental character, it is possible to control its position in a list when alphabetized. This behavior can be exploited to correctly sort index entries by adding known control characters (A, B, C…) to the seed or anchor term.

However, modifying the actual terms is unacceptable. Flare’s solution to this situation is to use a behind-the-scene or ancillary entry list comprised of alternative terms. These alternative terms can be discretionally modified to fit our needs.

The discrete entries mentioned above are known as “Sort as” entries. Flare uses them in lieu of the actual terms for sorting purposes only. They are not rendered in the resulting output, allowing us to be creative with these alternate strings.

Initial considerations

As in any software development process, the “better safe than sorry” axiom is the recommend principle to go by.

The following guidelines have been tested and work just fine in most cases but caution is paramount to safeguard the production Flare project by making backups or, if using a source control system, following the recommended practices.

The techniques described in this document involve editing files outside Flare’s safe environment. This approach is potentially hazardous so it is highly recommended to initially implement tweaks in a controlled environment, as described in section Testing the solution below.

Example

Let’s say we need to sort the words (or terms) One, Two, and Three, as if they were numbers: 1, 2, 3. In a normal alphabetization, the word Three would be placed before Two, resulting in One, Three, Two, which is not the desired result.

To tackle this situation in Flare, we use an alternate “Sort as” string associated with the term we need to reposition in the index. Since we want Two to be placed after One, we tell Flare “The term Two has to be sorted as OneA”, whereas One is the seed term and A the incremental control character.

The whole terms and strings involved are summarized in the table below:

Regular sorting
Desired sorting
Sort as
One
One
[Seed term]
Three
Two
OneA
Two
Three
OneB

Such information is passed on to Flare by means of an Index Link file. For the above example, the underlying entries in the Index Link file would look like this:

An Index Link file has a .FLINX extension and can be implemented within Flare1 while the section Implementing a large number of terms below describes an optional process to bulk add a large number of entries to the Index Link file. This FLIXL file is generally stored in the \Project\Advanced folder in the Flare project directory.

Japanese index terms workaround

As in the above example, to modify the default Flare sorting for Japanese index entries requires implementing a FLINX Index Link file or editing an existing one.

For example, the table below shows a list of missorted terms in the first column, then the correctly sorted terms next to it and, finally, what the “Sort As” field should look like to fix the sorting issue:

Wrong sorting
Correct sorting
Sort as
ロッ
ロッ
[Not applicable]
安に全
安に全
[Seed term: last correct entry]
監視
保視
安に全A
記号
記号
[Next seed term]
治療
監視
記号A
耐水
耐水
[Next seed term]
粘着
無音
耐水A
保視
用語
耐水B
無音
粘着
耐水C
用語
治療
耐水D

Terms in the two top rows are correctly sorted so they don’t need to be addressed in the Index Link file. Besides, the term in the second row is the last correctly sorted entry and thus the one used as seed or anchor term. The third column of the third row shows the “Sort as” entry comprised of the seed entry plus an incremental control character.

Fixing sub-term indexing

In the case of sub-terms (terms grouped under a top-level term), the approach to re-arrange them is a bit different since the process requires prefixing the incremental character to each sub-term entry within the term they are related to. Terms are also known as first-level keywords and sub-terms as second-level keywords.

The table below is an example of a top level term with four missorted sub-terms. The Flare syntax for this subordinate relationship is <term>:<sub-term> as shown in the third column. Sorting these sub-terms correctly requires prefixing them with an incremental control character, which is placed between the separating colon and the sub-term.

The table below shows an example of missorted entries, their correct sorting, and how the “Sort as" should look like:

Wrong sorting
Correct sorting
Sort as
アラート
アラート
[Top level term]
低値
緊急
アラート:A緊急
変更
高値
アラート:B高値
緊急
低値
アラート:C低値
高値
変更
アラート:D変更

The above sub-terms would look like this in the FLIXL file:

Implementing a large number of terms

While a few “Sort as” entries can be conveniently added via Flare’s Index pane, handling a large number of missorted entries can be cumbersome and error-prone, especially if carried out by a non-native speaker. In such cases, it is more effective to edit the Index Link file using Flare’s built-in text editor or any third-party text editor, like Notepad++2 or Microsoft Visual Studio Code3.

To further expedite the process, it is advisable to use a spreadsheet, such as Microsoft Excel or Open Office4, to quickly generate the required Index Link file entries.

The general plan is to arrange the necessary data in columns: XML code, Terms, and Sort as strings, as in the following table:

Lead code
Term
Mid code
Seed
Suffix
Trailing code
<IndexLink Term="
保視
" LinkType="sortas" LinkedTerm="
安に全
A
"></IndexLink>
<IndexLink Term="
監視
" LinkType="sortas" LinkedTerm="
記号
A
"></IndexLink>
<IndexLink Term="
無音
" LinkType="sortas" LinkedTerm="
耐水
A
"></IndexLink>
<IndexLink Term="
用語
" LinkType="sortas" LinkedTerm="
耐水
B
"></IndexLink>
<IndexLink Term="
粘着
" LinkType="sortas" LinkedTerm="
耐水
C
"></IndexLink>
<IndexLink Term="
治療
" LinkType="sortas" LinkedTerm="
耐水
D
"></IndexLink>

Then, in another column, concatenate the value of the cells in each row. Supposing the first column is column A and the first row is row 2 in an MS Excel spreadsheet, the concatenation formula would look like this:

=A2&B2&C2&D2&E2&F2

and so on for the remaining rows, as shown in the below screenshot.

The results in the concatenated column (G in the above screenshot), would look like this:

<IndexLink Term="保視" LinkType="sortas" LinkedTerm="安に全A"></IndexLink>
<IndexLink Term="監視" LinkType="sortas" LinkedTerm="記号A"></IndexLink>
<IndexLink Term="無音" LinkType="sortas" LinkedTerm="耐水A"></IndexLink>
<IndexLink Term="用語" LinkType="sortas" LinkedTerm="耐水B"></IndexLink>
<IndexLink Term="粘着" LinkType="sortas" LinkedTerm="耐水C"></IndexLink>
<IndexLink Term="治療" LinkType="sortas" LinkedTerm="耐水D"></IndexLink>

Select all the resulting cells in the concatenated column and copy them to the Clipboard, then paste them in the applicable FLIXL file which, for the above example, would look like this:

Testing the solution

As in any software engineering task, caution is paramount and the same holds true in localization engineering. Therefore, it is extremely important to test our tweaks to make sure nothing is broken and, especially, they work as intended.

However, building the whole output just for testing purposes each time the FLIXL file is updated, only to find out it needs to be edited again, is counter-productive. A more practical approach to test our changes is to create a sort of “Mini-Me” of the actual production index. This reduced version can be implemented within the actual Flare project or as a new one.

Either way, this is the required bill of materials:

  • A topic HTML file
  • A FLTOC Table of Content file including the above topic
  • A FLTAR Target file based on the above TOC
  • The FLIXL Index Link file to be tested

The topic file

This test topic file should include all the terms found in the production project. Similarly to the technique described in Implementing a large number of terms to update an FLIXL file, we can avail of a spreadsheet to quickly generate our topic file boasting all the terms found in the full project.

Once we have listed all the terms in the index output, we copy them to a column alongside the required HTML code in columns as well.

Here is how our spreadsheet would look like:

Lead code
Term
Close term
Close paragraph
<p><MadCap:keyword term="
保視
" />
</p>
<p><MadCap:keyword term="
監視
" />
</p>
<p><MadCap:keyword term="
無音
" />
</p>
<p><MadCap:keyword term="
用語
" />
</p>
<p><MadCap:keyword term="
粘着
" />
</p>
<p><MadCap:keyword term="
治療
" />
</p>

The cell values in each row are concatenated in another column. In this case, the formula to concatenate all four columns would be like this:

=A2&B2&C2&B2&D2

whereas column B cell is used twice, once as a term and once as the <p> tag inner text, which should give us the following results:

<p><MadCap:keyword term="保視" />保視</p>
<p><MadCap:keyword term="監視" />監視</p>
<p><MadCap:keyword term="無音" />無音</p>
<p><MadCap:keyword term="用語" />用語</p>
<p><MadCap:keyword term="粘着" />粘着</p>
<p><MadCap:keyword term="治療" />治療</p>

As in the below screenshot:

In Flare, we create a new topic file and paste the resulting cells by copying the concatenated cells in column E. Each cell will become a paragraph in our newly created topic:

The Table of Content and Target files

Once saved, the topic file is added to a new Table of Content FLTOC file. In turn, the newly created FLTOC is used as Primary TOC in a new PDF FLTAR Target file, ready to build our testing output.

The image below shows the dependency between these three files:

To build a PDF output with this downsized content takes significantly less than its full sized parent project. Fine-tuning our edited FLIXL file this way will be a lot more convenient and faster.

The section Index fixing within the localization workflow further discusses scenarios where the Index Link file might require updates during the localization process. Having a testing procedure in place will proof invaluable for these scenarios.

Once the desired index sorting is achieved in our mockup test output, the changes made in the Index Link file should also work for the parent project index.

Index fixing within the localization workflow

One potential caveat for this indexing workaround guide is if terms are modified during the Flare project localization process. There are different reasons for a term to be modified, but four of the most common ones are:

  • Mistranslation
  • Inconsistency
  • Incorrect term tag syntax
  • Updates or versioning

Although the details of each of the above categories fall beyond the scope of this guide, it is relevant to note their nature and the localization process stages where each potential issue might be flagged and fixed.

On the other hand, a potential bogus index sorting won’t be assessed until the resulting output is reviewed or submitted for Quality Assurance (QA). At this point, it is advisable to ask the linguist to confirm whether the index sorting in the submitted output is correct.

Once the linguistic review and QA are completed, we can address any issues with the sorting order. Given terms are subject to changes, it is crucial to monitor the terms in our Flare project throughout the rest of the localization process. If term changes occur whenever the output is submitted for review (In-Country Review, Subject Matter Expert review, further Quality Assurance stages), we must revisit our index fixing process and, most importantly, before the final delivery.

Downloads

A zip file including the files used for the examples in this guideline is available for download

The above zip file includes:

  • The scaled down Flare project IndexJA
  • The Index Link and term topic builder IndexJA.xlsx MS Excel spreadsheet

Disclaimer

The information and guidelines included in this document are offered for general information purposes and, although every effort has been made to ensure they serve the purpose, given the variety of potential code designs the results cannot be guaranteed under all condition and the author assumes no liability or responsibility of any kind over any erroneous, bogus, or unexpected results. It remains the user's responsibility to properly apply and adapt these guidelines to each situation.

References

1 – MadCap Flare Creating Index Links: https://help.madcapsoftware.com/flare2024/Content/Flare/Indexes/Main-Activities/Creating-Index-Links.htm

2 – Notepad++: https://notepad-plus-plus.org/

3 – Microsoft Visual Studio Code: https://code.visualstudio.com/

4 – Open Office: https://www.openoffice.org/

Happy indexing!