...to every nation, tribe, language and people...

HIS Registry of Dialects

Date: 2008-08
Status: Adopted
Abstract: Documents the Registry of Dialects (ROD) of the Harvest Information System (HIS). This registry identifies dialects and defines a standardized code for each dialect.
Editor: Allan Starling, GRN

Table of contents

  1. Overview
  2. Code tables
  3. Other tables
  4. Change management
  5. Change history
  6. Distribution

1. Overview

The function of the Registry of Dialects (ROD) is to (a) Identify specific varieties of given languages (defined by ISO 639-3) that research has determined to require distinct presentations (such as audio, video or print) in order to overcome barriers of understanding or acceptance. Determining factors may include differences in vocabulary, grammatical construction, idoms, and marked accents, as well as religoius or social barriers. (b) Provides unique, standardized codes for these dialects

The registry contains a code table, a supplementary table and a change history table:

  • ROD_Dialect: Code for Dialect

    A code in this code set represents a unique dialect of a living human language, as determined by the demonstrated need for distinct media presentations designed to overcome barriers of understanding or acceptance.

  • ROD_AlternateNameIndex

    This table provides an index into ROD_Dialect based on alternate dialect names.

  • ROD_DialectChangeHistory

    This table documents changes to the code set for dialects.

The code table and the supplementary table make use of the ROL_Language code table from the HIS Registry of Languages.

2. Code tables

The registry contains one code table:

ROD_Dialect: Code for Dialect

The code table contains a set of dialects of living languages that have been determined to require distinct media presentations.

By definition the scope of a dialect code is always a smaller group of speakers than the group represented by the assigned language as a whole.

Each code is a standardized five-digit numerical code for uniquely referring to a particular dialect. Using a code from this set not only uniquely identifies the dialect, but also identifies the language of which it is a part, and its corresponding ISO 639-3 (ROL) language code in the code table entry.

Additional information on the dialect may also be available in the Global Recordings website. Users have full access to those descriptions as follows:

  1. For any ROD dialect code dddddd, the following URL lists available information:

    http://www.globalrecordings.net/dialect/dddddd

    For example: http://www.globalrecordings.net/dialect/04231 will show information for ASMAT: Waganu (dialect #04231)

  2. For any ROL language code xxx the following URL lists information on related dialects:

    http://www.globalrecordings.net/langcode/xxx

    For example: http://www.globalrecordings.net/langcode/asc will show information for four dialects of the ASMAT language

The code table for ROD_Dialect contains the following columns:

Column Format Description
Dialect Code char(5) The five-digit numerical code for the Dialect.
Dialect Name varchar(75) The primary name of the Dialect.
Language Name varchar(75) The primary name of the Language.
ISO char(3) A code from the ROL_Language code set that identifies the Language associated with the Dialect.
Location varchar(75) The name of the country where the dialect is spoken. This may be followed by the province and/or district.

The SQL statement for creating this table is as follows:

CREATE TABLE ROD_Dialect (
Code char(5) NOT NULL,
Name varchar(75) NOT NULL,
Language varchar(75) NOT NULL,
ISO char(3) NOT NULL,
Location varchar(75) NOT NULL)

For example, typical entries for dialects of the Southern Pashtu language will look like this:

Code

Dialect

Language

ISO

Location

00090 Kabuli Pashto, Southern pbt Afghanistan
00843 Baluchi Pashto, Southern pbt Afghanistan
00853 Western Pashto, Southern pbt Afghanistan
15545 Kandahar Pashto Pashto, Southern pbt Afghanistan
15546 Qandahar Pashto Pashto, Southern pbt Afghanistan
15547 Quetta Pashto Pashto, Southern pbt Pakistan
15548 Southeastern Pashto Pashto, Southern pbt Pakistan
15549 Southwestern Pashto Pashto, Southern pbt Afghanistan

Because dialects are linked to languages, many of their names (e.g. “West”) do not stand alone. In the diagram above, dialect code 00853 represents the Western dialect (or variety) of the Southern Pashtu language in Afghanistan. Language names are quoted directly from the Ethnologue. Where possible, dialect names indicate the preference of the speakers.

3. Other tables

The registry contains one supplementary table.

ROD_AlternateNameIndex

This supplementary table offers an index into the dialect codes table by means of alternate names. Whereas the ROD_Dialect table lists only primary dialect names, this index makes it possible to find a code by any one of the alternate names for a given dialect.

The ROD_AlternateNameIndex contains the following columns:

Column Format Description
Dialect Code char(5) The five-digit numerical code for the dialect from ROD_Dialect
Variant varchar(75) A name associated with the dialect

The SQL statement for creating this table is as follows:

CREATE TABLE ROD_AlternateNameIndex (
Code char(5) NOT NULL,
Variant varchar(75) NOT NULL)

For instance, a sample of typical entries for alternate names for Kabuli language will look like the following:

00090 Afghan
00090 Farsi, Estern: Dari: Kabuli
00090 Kabuli
00090 Pakhtoo
00090 Pakhtu
00090 Paktu
00090 Pashto
00090 Pashto: Kabuli
00090 Pashtu
00090 Pushto
00090 Pushtu
00090 Quetta-Kandahar Pashto

Note that some entries show both the language and the dialect, and include a colon and/or a comma. The second entry in the above table shows Farsi, Eastern: Dari: Kabuli. In this example, “Farsi, Eastern” is the equivalent of Eastern Farsi. The first colon separates the language name from the dialect, so “Dari” is the dialect. The second colon indicates that “Kabuli” is a sub-dialect of Dari.

Search by Name:

The index table can be used to implement a search by name. For instance, the following query would return the five-digit codes for all the dialects that use the name SomeName.

SELECT DISTINCT Code
FROM ROD_AlternateNameIndex
WHERE Variant='SomeName'

Multiple dialects may be known by the same name. To verify the identity of a dialect, the user may consult the following link to the Global Recordings web site to see a report giving additional information about the selected dialect. For any ROD dialect code dddddd the following URL gives added information:

http://www.globalrecordings.net/dialect/dddddd

Additional information may be available on the Dialect Research Website as follows:

http://globalrecordings.net/research/dialect/dddddd

This web site contains a much larger selection of names, many of which have not yet qualified for inclusion in the ROD. For instance, the following link will show information for ASMAT: Waganu (dialect #04231):

http://globalrecordings.net/research/dialect/04231

4. Change management

This section defines the process that the registry steward will follow to maintain the registry.

Governing philosophy

The basic philosophy of the code set in this registry is to list only of those dialects that have proven to be distinct enough to require different treatment in ministry strategies, rather than to produce an exhaustive list of all possible dialects.

Situations that will result in a change

There are a number of situations that could cause the Dialect Codes or their links to Language Codes to change. Situations include, but are not limited to the following:

  1. Changes that occur in the ISO 639-3 language code set may affect registered dialects.
  2. Speech forms that now qualify as ROD dialects according to the definition given in Section 1 will be assigned ROD Dialect Name Codes.
  3. Errors in the tables that are corrected according to the process outlined below may result in new codes being addigned.

How to make a change request

A change will be made when it can be shown to correct an error, or to improve the coverage and usefulness of a code set. If you believe any of the information in the Registry of Dialects is in error, or if you have an addition or improvement to suggest, send your proposed change by e-mail to RODsteward@globalrecordings.net. Be sure to report the source of your information.

How change requests are processed

The ROD steward will review the proposal and acknowledge receipt with initial comment or request for clarification. If the change appears to be in line with the philosophy of the registry, the steward will vet the proposal with others who are known to be using the registry. The data will also be added to the Dialect Research Website. It will be adopted only if there is a positive consensus among those consulted.

How updates are made

The registry will be updated when changes are adopted. The most recent version will always be available for download from this page.

What to do pending a change

Each proposed dialect name will be allotted a provisional five-digit code as soon as it is entered into the Dialect Research web site. Thus, when users need to use a code that is not yet part of a code set, they may freely use this provisional code until the outcome of a request to add a code is known.

5. Change history

The registry will contain just one change history table.

ROD_DialectChangeHistory

All changes to ROD_Dialect are reported in ROD_DialectChangeHistory. This table is cumulative, listing all changes to successive versions of the registry. The table has the following four columns:

Column Format Description
Code char(5) The dialect code that is affected by the change reported in this record.
Type char(1) A one-letter code indicating the type of change: There are four possible values:

C = Created. The dialect code is newly created.

E = Extended. The meaning of the code is extended by virtue of being merged with a code that has been retired.

M = Matching change. Either the ROL code to which the Dialect was previously matched has been changed or the dialect has now been matched to a different code.

R = Retired. The dialect code has been retired and should no longer be used in a database.

U = Updated. There has been no change to the code or its meaning, but other information in the code table entry (e.g. primary name, the associated language, or the main country) has changed.

Date char(10) The date the change was released in a new version of the registry. Dates are expressed as 8 digits with hyphens to separate the parts of the date, e.g. YYYY-MM-DD.
Description varchar(255) Describes the change. In the case of R changes, it also describes what a user should do to fix existing data that uses the now retired code.

Note that there is not a change type for the case of narrowing the meaning of a code, such as when the dialect denoted by one code is split into two dialects. In such a case, the original code is retired, and two new codes are added. In this way, the user of the code set is assured that once a code has been used to tag an item of data, it will continue to be the right code to use for as long as the code remains an active member of the code set.

The SQL statement for creating the change history table is as follows:

CREATE TABLE ROD_DialectChangeHistory (
Code char(5) NOT NULL,
Type char(1) NOT NULL,
Date char(10) NOT NULL,
Description varchar(255) )

Here are some sample rows from the change history table:

Code Type Date Description
10327 C 2005-12-15 Add Zemiaki
01432 E 2005-12-15 Includes 13980 that was retired
00681 M 2006-10-12 Merge with 14566; change all 00681 to 14566
01189 R 2007-05-10 Same as 10034; change all 01189 to 10034
00032 R 2007-08-09 Unable to verify existence; delete from database
11759 U 2008-01-10 Change name from Chikaranga to Karanga
12345 M 2008-05-11 ISO code split into two codes abd and ljg

The change history table holds the cumulative list of all changes that have ever been made to the registry. Thus it may be queried to learn the complete history of a given code, or to learn all the changes that have been made since a given date. For instance, the following SQL query would be used to find out what changes have occurred since the beginning of 2005

SELECT *
FROM ROD_DialectChangeHistory
WHERE Date >= 2005-01-01

For a site that has used ROL_Dialect codes in its own database, an important use of the change history table is to discover codes used in its data that are now obsolete and thus need to be changed. These will be only the codes that have been retired. Thus a full list of all data records needing to be changed can be found by doing a JOIN on the change history table. For instance, if the column named code in MyTable holds an ROD_Dialect code, then the following SQL statement will select all records that need to be changed due to changes to the code set since the beginning of 2002:

SELECT *
FROM MyTable as M
JOIN ROD_DialectChangeHistory as C ON M.code=C.Code
WHERE C.Type='R' AND C.Date >= 2005-01-01

Note that the Description field of the joined result set will describe what needs to be done to bring the selected language code up-to-date.

6. Distribution

A complete distribution of the Registry of Ministry Resources includes:

  • ROD 2008-08.doc (this document)

  • Code table to be used for human reference:

    • ROD_Dialect.doc

  • Three Tab-delimited code tables to be used for loading into a database:

    • ROD_Dialect.txt

    • ROD_AlternateNameIndex.txt

    • ROD_DialectChangeHistory.txt

These are all available in a single zip file:

  • ROD 2008-08-31.zip