Frontier Tutorials / Indexing a Website / Build the Alphabetical Index

Build the Alphabetical Index

fatpage picture space picture This page is a Fat Page. It includes the BuildTitleAlphaIndex script, encoded by and for Frontier. To retrieve the script(s), save the page as source text and open it using the File->Open command.

An alphabetical index and a topical index are two very different beasts. You might as well compare a unicorn to the 'orrible black beast of aaaarrrrrrgggghghghggghhhhhhhhh![1]

To create an alphabetical index, we'll use the same Indexer Suite script we used before: indexer.BuildPageIndex. But instead of indexing by keyword, we'll use the page titles. And to avoid polluting the topic index, we'll create a new index: titleAlpha.

The BuildTitleAlphaIndex Script

The key function call we need to make to build this new index is

Indexer.BuildPageIndex( @websites.mysite, @websites.mysite.["#indices"].["titleAlpha"], true, "title", "title" )

(Replace "websites.mysite" with your own site table, of course.)

Type this command into the Quick Script window and execute it, then examine the ["#indices"].titleAlpha subtable. It should contain an entry for each title.

But it's a pain to have to type in something like this whenever you want to rebuild the index, so we'll put it in a script, and call it BuildTitleAlphaIndex:

on BuildTitleAlphaIndex_TUT1( sourceAdr=@tutorials.indexsite, destTbl=@tutorials.indexsite.["#indices"].["titleAlpha"], inReplaceIndices=true )
space pictureIndexer.BuildPageIndex( sourceAdr, destTbl, inReplaceIndices, "title", "title" )

Another trivial script. In this case, though, I'm looking ahead; as we'll see shortly, this is not going to be quite adequate.

Add this to the menu item script for the Update Indices command on your website menu. The menu item script should now look like this:

websites.mysite.["#tools"].BuildTopicsIndex()
websites.mysite.["#tools"].BuildTitleAlphaIndex()

Not Quite Right

Select Update Indices command from your website menu.

Open the ["#indices"].titleAlpha subtable. Look at the names of the entries. If any of them start with "a", "the", "an", or other articles, they probably aren't sorted correctly. When the title of a paper or book begins with one of these words, the standard way of sorting is to ignore the meaningless leading word and sort alphabetically on the remainder. Other, related rules sort names beginning with "St." as "Saint", or "McAnything" as "MacAnything". And with the indexer.BuildPageIndex script, there's not much we can do about it.

What we need to do is to intercept the indexer and massage the article titles and/or keywords so they are entered the way we want them to.

And guess what? The Indexer Suite let's us do that, with the indexer.BuildPageIndexGeneric script. It's a little more difficult to use than indexer.BuildPageIndex, but much more powerful.

Indexer.BuildPageIndexGeneric

Indexer.BuildPageIndexGeneric constructs a keyword index of all pages in a specified website table or subtable. To determine whether to add a page to the index, and what keywords should be used, it calls a pair of callback functions.

Only entries in the source table (and its subtables) that the website framework will render into HTML pages will be indexed. All other entries are ignored, and will not appear in the index/indices.

Any page entry for which the test callback returns TRUE will be included in the index. Any table for which the test callback does not return TRUE will is ignored (i.e., will not appear in the generated index).

Syntax

BuildPageIndexGeneric( inSourceAdr, inDestTbl, keywordSpec, testCB, infoCB, inReplaceIndices=true, doExpandNestedKeywords=false )

inSourceAdr
The location from which pages should be indexed. To index your entire site, this would be the address of the website table (e.g., @websites.mysite). To index only a portion of a site, pass the address of the subsite table. To index just a single page (as from the #filters.finalFilter script), pass the address of the individual page.

inDestTbl
The address of the table in which the index information should be stored (e.g., @websites.mysite.["#indices"].["topic"]).

keywordSpec
An identifier for the keywords to be used by the callback functions. Because this parameter is interpreted by user-provided callback functions, the type and range of values of this parameter may vary widely depending on application.

testCB
The address of a callback function that BuildPageIndexGeneric will call to determine whether to index a particular table or to continue scanning.

The test callback is expected to take two parameters:

entryAdr
The address of the entry (table or page) currently being examined by BuildPageIndexGeneric.

keywordSpec
The keywordSpec parameter that was passed to BuildPageIndexGeneric.

The test callback must return either TRUE or FALSE. If the test callback returns TRUE, the info callback will be called and the table will be added to the index. If the test callback return FALSE, the info callback will not be called and the table will not be added to the index.

infoCB
The address of a callback function that BuildPageIndexGeneric will call to get the indexing values for a given table. The info callback function will only be called if the test callback function returned TRUE.

The info callback is expected to take two parameters:

tableAdr
The address of the entry (table or page) currently being examined by BuildPageIndexGeneric.

keywordSpec
The keywordSpec parameter that was passed to BuildPageIndexGeneric.

The info callback is expected to return a table. The returned info table may contain the following elements:

keywords
[REQUIRED] A keyword specification, as described in suites.indexer.doc.Keywords.

entryAdr
[REQUIRED] The entry address to enter in the index. This may be the tableAdr that was passed in, or it may be any other address in the ODB.

entryName
[OPTIONAL] The name to use for the entry in the index. This forces the sort order of the index. Note that the index does not support multiple entries with the same name; if you return the same name for multiple entries, only the latest one will appear in the index.

errorMessage
[OPTIONAL] This is an error message generated by the info callback script. If errorMessage is set, all other fields in the returned info table will be ignored.

inReplaceIndices
Specifies whether to replace the index at destAdr^ or to simply add to it.

doExpandNestedKeywords
Specifies whether nested keywords should also be entered individually. If true, expanded keywords will be entered in the index as specified, and will also be split into individual keywords, and those keywords will be entered in the index.

For example, if doExpandNestedKeywords is true, "frontier:community" is equivalent to "frontier:community, frontier, community".

The Improved BuildTitleAlphaIndex Script

We need to change the BuildTitleAlphaIndex script to call BuildPageIndexGeneric instead of BuildPageIndex, and create the necessary callback functions. The necessary BuildPageIndexGeneric call is:

indexer.BuildPageIndexGeneric( sourceAdr, destTbl, "title", @PageTestCB, @PageInfoCB, inReplaceIndices )

Replace the BuildPageIndex function call with this new call.

The Test Callback Function

The Test Callback function, PageTestCB, must return TRUE if and only if the page whose address is passed to it should be included in the index. In this case, we simply want to include anything that is a renderable page. So insert this function in BuildTitleAlphaIndex just before the call to BuildPageIndexGeneric:

on PageTestCB( entryAdr, keywordSpec )
space pictureif ( keywordSpec != "title" )
space picture« We're not interested
space picturereturn ( FALSE )
space pictureif html.traversalSkip( entryAdr )
space picture« Not rendered as page, so we don't care
space picturereturn ( FALSE )
space pictureif ( typeOf( entryAdr^ ) == tableType )
space picture« Need to figure out if it's a page.
space picturelocal
space picturerenderTableWith = html.getPagePref( "renderTableWith", entryAdr )
space pictureif ( typeOf( renderTableWith ) == booleanType )
space picture« Not set, so it isn't a page.
space picturereturn ( FALSE )
space picture« Renders as a page, so we want it.
space picturereturn ( TRUE );

The callback first calls html.traversalSkip to determine whether it should even be considered as a page.

If the entry is a table, it tries to locate a renderTableWith directive, which would indicate that the table gets rendered as a page rather than a directory. If the directive is not found, it is not a page.

Finally, anything left is assumed to be a page, and should be added to the index.

The Info Callback Function

The info callback must return a table, as described above. It must fill in the following values:

entryAdr--This one's easy: it's the address of the page entry in the website table--the entry address that is passed into the function.

entryName--This forces the sort order in the index table, so we should omit the leading articles (the, an, a, etc.) here--or better yet, move them to the end, after a comma, as is normally done with articles and books. We do have to set this: if we don't, the sort order will be by ODB address, which is not particularly useful on a web page.

keywords--Another easy one: it's the first character of the title (as modified for entryName.

For example, if the page title is "The Importance of Being Earnest", we would store the following values in our returned info table:

entryName = "Importance of Being Earnest, The"
keywords = "I"

When we put all these pieces together, we get the following PageInfoCB function:

on PageInfoCB( entryAdr, keywordSpec )
space picturelocal
space picturereturnTable
space picturenew( tableType, @returnTable )
space picturebundle « Validate inputs
space pictureif ( keywordSpec != "title" )
space picture« We're not interested
space picturereturnTable.errorMessage = "BuildTitleAlphaIndex.infoCB called for keyword spec != title [" + keywordSpec + "]"
space picturereturn
space pictureif ! defined( entryAdr^ )
space picture« Bad address!
space picturereturnTable.errorMessage = "BuildTitleAlphaIndex.infoCB called with invalid address [" + entryAdr + "]"
space picturereturn
space picturebundle « Set return values
space picturelocal
space picturepageTitle = html.getPagePref( "title", entryAdr )
space pictureignoreLeadingWords = {"the", "an", "a"}
space picturefirstWord = string.firstWord( pageTitle )
space picture« Entry (page) address
space picturereturnTable.entryAdr = entryAdr
space picture« Entry name (for sort order)
space pictureif ( ignoreLeadingWords contains string.lower( firstWord ) )
space picturereturnTable.entryName = string.mid( pageTitle, string.length( firstWord ) + 1, infinity ) + ", " + firstWord
space pictureelse
space picturereturnTable.entryName = pageTitle
space picture« Remove any extra spaces that followed the first word
space picturewhile ( returnTable.entryName beginsWith " " )
space picturereturnTable.entryName = string.delete( returnTable.entryName, 1, 1 )
space picture« Keywords: Initial character of entry name
space picturereturnTable.keywords = string.upper( string.mid( returnTable.entryName, 1, 1 ) )
space picturereturn ( returnTable )

Insert this function in BuildTitleAlphaIndex just before the call to BuildPageIndexGeneric:

I'll save you the trouble of typing this all in. The final version of BuildTitleAlphaIndex is stored in this page in fatpage format. Save this page to disk, and open it with Frontier's File->Open... command to import it.

Once again, we have an index and need to display it on a web page.


1 This unpronouncable creature was a source of terror to the crusaders of Monty Python and the Holy Grail.

Tutorial Contents
Indexing a Website
Bricks and Mortar
Plan the Project
Add Keywords
Build The Keyword Index
Display The Keyword Directory
Build the Alphabetical Index
Display The Alphabetical Directory
Summary
Hints and Tips
Downloadable Scripts
About the Author