2014/12/31

Analysing 635M lines of GEDCOM

imageThe GEDCOM parser of Genealogie Online needed a rewrite. The code base had grown out of proportion, resulting in inefficient code and cumbersome maintenance.

A big difference between the start of coding the GEDCOM parser and now is the number of GEDCOM files available: nearly 7 thousand. This gave me the opportunity to do some analysis (and more testing)!

 

Analysis of versions and character sets

First of all the headers of all these GEDCOM files were examined to get a feeling about which GEDCOM grammars and character sets were used.

GEDCOM version*

Count

5.5

6.339

(undefined)

248

5.5.1

245

v.1.0.01 Beta

12

5.3

5

4.0

2

2.0

1

4

1

5.01

1

Total

6.854

* The GEDCOM version as presented in HEAD > GEDC > VERS. I did not check if content did actually conform to the presented grammar version. I did manually check the 5.3, 4, etc. versions, on first glance they seemed just GEDCOM 5.5.

image

The fact that only 3.6% of the GEDCOM files identified itself as 5.5.1 surprised me as this is regarded as the current de-facto standard.

It must be noted that a big portion of GEDCOM files where produced by Dutch family tree programs. But, as can be seen on the Used family tree programs (click on the program name to expand statistics) page on Genealogie Online, only Legacy, MacFamilyTree, Ahnenblatt, PhpGedView and RootsMagic advertise their GEDCOM with the 5.5.1 label.

For the GEDCOM parser it was clear, support 5.5 (and 5.5rev) and 5.5.1 GEDCOM files.

Character set

Count

ANSI

4.395

UTF-8

1.269

ANSEL

692

ASCII

312

(undefined)

95

WINDOWS

27

IBMPC

26

MACINTOSH

21

IBM WINDOWS

11

UNICODE

5

windows-1251

1

Total

6.854

image

The number of files claiming to be UTF-8 is funny. This is because UTF-8 was introduced in GEDCOM 5.5.1. So 1.265 files claimed to be UTF-8 and 243 files claimed to be GEDCOM 5.5.1. This puts the low 3.6% in another perspective…

Fortunately, I could re-use code from the old GEDCOM parser to correctly handle character sets and encoding (was a solid piece of code).

Note: Tim Forsythe publishes similar stats from GigaTrees, which paints a more American picture (for example: 14.4% GEDCOM 5.5.1).

 

Analysis of actual use

The old GEDCOM parser also included support for invalid GEDCOM tags and custom GEDCOM tags. Although I wrote the article GEDCOM files which don’t adhere to the GEDCOM standard shouldn’t be allowed to be called GEDCOM, for Genealogie Online I’m more forgiving. I want to present the genealogical data of my users and don’t want to bother them to much with the fact that their family tree program isn’t producing valid GEDCOM. But, which of the invalid and custom tags to support in the new GEDCOM parser?

I decided to read all the GEDCOM files and count the tag-sequence uses. This resulted in a CSV file which looks like:

INDI-BIRT-AGE,45
INDI-BIRT-AGNC,1820
INDI-BIRT-DATE-ANC,162
INDI-BIRT-DATE-NOTE,172764
INDI-BIRT-DATE-NOTE-CONT,11311
INDI-BIRT-DATE-SOUR,39752
INDI-BIRT-DATE-SOUR-DATE,15951
INDI-BIRT-DATE-SOUR-ITEM,16825
INDI-BIRT-DATE-SOUR-PAGE,486
INDI-BIRT-DATE-SOUR-ROLE,36055
INDI-BIRT-DOCTOR,1
INDI-BIRT-EMAIL,1
INDI-BIRT-FAMC,1223
INDI-BIRT-LABL,4092
INDI-BIRT-LATI,49730
INDI-BIRT-LONG,49730
INDI-BIRT-MOON,37
INDI-BIRT-NOTE,720806

Next step in the analysis was visualisation of this file. I opted for my favourite Javascript module D3.js which provides a cool collapsible tree. The result is available to all those interested on the GEDCOM tag usage page (also downloadable and e-usable under a CC-BY license).

image

The colour of the node indicates if the tag-sequence is valid under the GEDCOM 5.5 grammar (red > 83.7%) or not (grey > 16.2%). This visualisation aspect is not completely accurate as not all GEDCOM files are version 5.5 (the actual version wasn't taken into account).

These tags trees give a good picture of usage. If a invalid of custom tag is used a lot, I would look into the implementation part of the GEDCOM parser.

For fun I also made selections for the top-10 programs used by Genealogie Online users. This way, you can see which program has more or less invalid/custom tags…

For my own reference I made tag trees for GEDCOM 5.5 (which is the “2 January 1986” version, which was hindered by the fact that «NOTE_STRUCTURE» references «SOURCE_CITATION» and vice versa, thus introducing a loop!) and GEDCOM 5.5.1.

De data used for all of these tag trees is also downloadable in CSV and JSON format under a CC-BY license.

The end result, besides nice visualizations, is a lean, more robust and complete GEDCOM parser for Genealogie Online! Users will notice a better support/presentation of sources and notes, and for some programs the use of RIN for identification of persons.

Which GEDCOM 5.5 grammar is correct?

The GEDCOM 5.5 standard is described in a PDF document prepared by the Family History Department of the The Church of Jesus Christ of Latter-day Saints dated 2 January 1996 (which in two days is 19 years ago).

When you Google for the GEDCOM 5.5 grammar you usually end up on the HTML version by Paul McBride which he himself calls “unofficial” (or you find the grammar files of Gedcom.pm by Paul Johnson). But over the years no one seemed to have noticed that the HTML version has a slightly different date “2 January 1996 [Revised 10 January 1996]” and differences in grammar!

Errata Sheet

Although the PDF document includes an Errata Sheet, it seems there are others. When you dig into the archives of Internet you can find references to an Errata Sheet dated 10 January 1996 which has been faxed to some people.
A GEDCOM 5.5 Errata Sheet dated 10 January 1996 supposedly contains corrections to pages 23, 24, 25, 26, 29, 29, 29, 33, 34, 39, 57, 79, and 85.
Unfortunately, this document has not hit the Internet yet, so we can’t say for sure that the “10 January 1996” version by McBride is based on this Errata Sheet.
Some of the differences in the GEDCOM 5.5 grammar between the “2 January 1996” and “Revised 10 January 1996” version are small (typo’s) but some are big (see the diff below)!

Big questions

I think the “Revised 10 January 1996” version - let's call this version GEDCOM 5.5rev - is used a lot, mainly because the HTML version is more accessible. But should we consider this an official version? In my opinion: no (because not an official LDS publication).

If there was an Errata Sheet dated 10 January 1996, why did the LDS didn’t publish it (in PDF form, online) and why didn’t they make a new GEDCOM version which they should have considering some changes are big?

A draft version of version 5.5.1 was only published in 2 October 1999 (see FamilySearch GEDCOM Specifications by Tamura Jones for a complete overview of specifications). This document contains a section which enumerates the differences with the previous version. But, some of the changes, compared to the “2 January 1996” version, which you can see in the “Revised 10 January 1996” version, weren’t mentioned in this section. I guess, the LDS internally were uncertain too about what was the correct GEDCOM 5.5 grammar.

GEDCOM 5.5 Grammar Diff

Below is a comparison between the Record Structures and Substructures of the Lineage-Linked Form (the Primitive elements of the Lineage-Linked Form are the same) between the “2 January 1996” and “Revised 10 January 1996” versions. I only focussed on the grammar, not the rest of the text in the specification. Orange highlighting means a small difference, yellow highlighting indicates a big difference. The table can also be downloaded in PDF format.


Lineage-Linked GEDCOM Form's grammar 5.5 Lineage-Linked GEDCOM Form's grammar 5.5
LDS/PDF version, dated 2 January 1996 McBride/HTML version, revised 10 January 1996
LINEAGE_LINKED_GEDCOM:= LINEAGE_LINKED_GEDCOM:=
0 <<HEADER>> {1:1} 0 <<HEADER>> {1:1}
0 <<SUBMISSION_RECORD>> {0:1} 0 <<SUBMISSION_RECORD>> {0:1}
0 <<RECORD>> {1:M} 0 <<RECORD>> {1:M}
0 TRLR {1:1} 0 TRLR {1:1}
HEADER:= HEADER:=
n HEAD {1:1} n HEAD {1:1}
+1 SOUR <APPROVED_SYSTEM_ID> {1:1} +1 SOUR <APPROVED_SYSTEM_ID> {1:1}
+2 VERS <VERSION_NUMBER> {0:1} +2 VERS <VERSION_NUMBER> {0:1}
+2 NAME <NAME_OF_PRODUCT> {0:1} +2 NAME <NAME_OF_PRODUCT> {0:1}
+2 CORP <NAME_OF_BUSINESS> {0:1} +2 CORP <NAME_OF_BUSINESS> {0:1}
+3 <<ADDRESS_STRUCTURE>> {0:1} +3 <<ADDRESS_STRUCTURE>> {0:1}
+2 DATA <NAME_OF_SOURCE_DATA> {0:1} +2 DATA <NAME_OF_SOURCE_DATA> {0:1}
+3 DATE <PUBLICATION_DATE> {0:1} +3 DATE <PUBLICATION_DATE> {0:1}
+3 COPR <COPYRIGHT_SOURCE_DATA> {0:1} +3 COPR <COPYRIGHT_SOURCE_DATA> {0:1}
+1 DEST <RECEIVING_SYSTEM_NAME> {0:1*} +1 DEST <RECEIVING_SYSTEM_NAME> {0:1*}
+1 DATE <TRANSMISSION_DATE> {0:1} +1 DATE <TRANSMISSION_DATE> {0:1}
+2 TIME <TIME_VALUE> {0:1} +2 TIME <TIME_VALUE> {0:1}
+1 SUBM @XREF:SUBM@ {1:1} +1 SUBM @<XREF:SUBM>@ {1:1}
+1 SUBN @XREF:SUBN@ {0:1} +1 SUBN @<XREF:SUBN>@ {0:1}
+1 FILE <FILE_NAME> {0:1} +1 FILE <FILE_NAME> {0:1}
+1 COPR <COPYRIGHT_GEDCOM_FILE> {0:1} +1 COPR <COPYRIGHT_GEDCOM_FILE> {0:1}
+1 GEDC {1:1} +1 GEDC {1:1}
+2 VERS <VERSION_NUMBER> {1:1} +2 VERS <VERSION_NUMBER> {1:1}
+2 FORM <GEDCOM_FORM> {1:1} +2 FORM <GEDCOM_FORM> {1:1}
+1 CHAR <CHARACTER_SET> {1:1} +1 CHAR <CHARACTER_SET> {1:1}
+2 VERS <VERSION_NUMBER> {0:1} +2 VERS <VERSION_NUMBER> {0:1}
+1 LANG <LANGUAGE_OF_TEXT> {0:1} +1 LANG <LANGUAGE_OF_TEXT> {0:1}
+1 PLAC {0:1} +1 PLAC {0:1}
+2 FORM <PLACE_HIERARCHY> {1:1} +2 FORM <PLACE_HIERARCHY> {1:1}
+1 NOTE <GEDCOM_CONTENT_DESCRIPTION> {0:1} +1 NOTE <GEDCOM_CONTENT_DESCRIPTION> {0:1}
+2 [CONT|CONC] <GEDCOM_CONTENT_DESCRIPTION> {0:M} +2 [CONT|CONC] <GEDCOM_CONTENT_DESCRIPTION> {0:M}
RECORD:= RECORD:=
[ [
n <<FAM_RECORD>> {1:1} n <<FAM_RECORD>> {1:1}
| |
n <<INDIVIDUAL_RECORD>> {1:1} n <<INDIVIDUAL_RECORD>> {1:1}
| |
n <<MULTIMEDIA_RECORD>> {1:M} n <<MULTIMEDIA_RECORD>> {1:M}
| |
n <<NOTE_RECORD>> {1:1} n <<NOTE_RECORD>> {1:1}
| |
n <<REPOSITORY_RECORD>> {1:1} n <<REPOSITORY_RECORD>> {1:1}
| |
n <<SOURCE_RECORD>> {1:1} n <<SOURCE_RECORD>> {1:1}
| |
n <<SUBMITTER_RECORD>> {1:1} n <<SUBMITTER_RECORD>> {1:1}
] ]
FAM_RECORD:= FAM_RECORD:=
n @<XREF:FAM>@ FAM {1:1} n @<XREF:FAM>@ FAM {1:1}
+1 <<FAMILY_EVENT_STRUCTURE>> {0:M} +1 <<FAMILY_EVENT_STRUCTURE>> {0:M}
+2 HUSB {0:1} +2 HUSB {0:1}
+3 AGE <AGE_AT_EVENT> {1:1} +3 AGE <AGE_AT_EVENT> {1:1}
+2 WIFE {0:1} +2 WIFE {0:1}
+3 AGE <AGE_AT_EVENT> {1:1} +3 AGE <AGE_AT_EVENT> {1:1}
+1 HUSB @<XREF:INDI>@ {0:1} +1 HUSB @<XREF:INDI>@ {0:1}
+1 WIFE @<XREF:INDI>@ {0:1} +1 WIFE @<XREF:INDI>@ {0:1}
+1 CHIL @<XREF:INDI>@ {0:M} +1 CHIL @<XREF:INDI>@ {0:M}
+1 NCHI <COUNT_OF_CHILDREN> {0:1} +1 NCHI <COUNT_OF_CHILDREN> {0:1}
+1 SUBM @<XREF:SUBM>@ {0:M} +1 SUBM @<XREF:SUBM>@ {0:M}
+1 <<LDS_SPOUSE_SEALING>> {0:M} +1 <<LDS_SPOUSE_SEALING>> {0:M}
+1 <<SOURCE_CITATION>> {0:M} +1 <<SOURCE_CITATION>> {0:M}
+2 <<NOTE_STRUCTURE>> {0:M}
+2 <<MULTIMEDIA_LINK>> {0:M}
+1 <<MULTIMEDIA_LINK>> {0:M} +1 <<MULTIMEDIA_LINK>> {0:M}
+1 <<NOTE_STRUCTURE>> {0:M} +1 <<NOTE_STRUCTURE>> {0:M}
+1 REFN <USER_REFERENCE_NUMBER> {0:M} +1 REFN <USER_REFERENCE_NUMBER> {0:M}
+2 TYPE <USER_REFERENCE_TYPE> {0:1} +2 TYPE <USER_REFERENCE_TYPE> {0:1}
+1 RIN <AUTOMATED_RECORD_ID> {0:1} +1 RIN <AUTOMATED_RECORD_ID> {0:1}
+1 <<CHANGE_DATE>> {0:1} +1 <<CHANGE_DATE>> {0:1}
INDIVIDUAL_RECORD:= INDIVIDUAL_RECORD:=
n @XREF:INDI@ INDI {1:1} n @<XREF:INDI>@ INDI {1:1}
+1 RESN <RESTRICTION_NOTICE> {0:1} +1 RESN <RESTRICTION_NOTICE> {0:1}
+1 <<PERSONAL_NAME_STRUCTURE>> {0:M} +1 <<PERSONAL_NAME_STRUCTURE>> {0:M}
+1 SEX <SEX_VALUE> {0:1} +1 SEX <SEX_VALUE> {0:1}
+1 <<INDIVIDUAL_EVENT_STRUCTURE>> {0:M} +1 <<INDIVIDUAL_EVENT_STRUCTURE>> {0:M}
+1 <<INDIVIDUAL_ATTRIBUTE_STRUCTURE>> {0:M} +1 <<INDIVIDUAL_ATTRIBUTE_STRUCTURE>> {0:M}
+1 <<LDS_INDIVIDUAL_ORDINANCE>> {0:M} +1 <<LDS_INDIVIDUAL_ORDINANCE>> {0:M}
+1 <<CHILD_TO_FAMILY_LINK>> {0:M} +1 <<CHILD_TO_FAMILY_LINK>> {0:M}
+1 <<SPOUSE_TO_FAMILY_LINK>> {0:M} +1 <<SPOUSE_TO_FAMILY_LINK>> {0:M}
+1 SUBM @<XREF:SUBM>@ {0:M} +1 SUBM @<XREF:SUBM>@ {0:M}
+1 <<ASSOCIATION_STRUCTURE>> {0:M} +1 <<ASSOCIATION_STRUCTURE>> {0:M}
+1 ALIA @<XREF:INDI>@ {0:M} +1 ALIA @<XREF:INDI>@ {0:M}
+1 ANCI @<XREF:SUBM>@ {0:M} +1 ANCI @<XREF:SUBM>@ {0:M}
+1 DESI @<XREF:SUBM>@ {0:M} +1 DESI @<XREF:SUBM>@ {0:M}
+1 <<SOURCE_CITATION>> {0:M} +1 <<SOURCE_CITATION>> {0:M}
+1 <<MULTIMEDIA_LINK>> {0:M} +1 <<MULTIMEDIA_LINK>> {0:M}
+1 <<NOTE_STRUCTURE>> {0:M} +1 <<NOTE_STRUCTURE>> {0:M}
+1 RFN <PERMANENT_RECORD_FILE_NUMBER> {0:1} +1 RFN <PERMANENT_RECORD_FILE_NUMBER> {0:1}
+1 AFN <ANCESTRAL_FILE_NUMBER> {0:1} +1 AFN <ANCESTRAL_FILE_NUMBER> {0:1}
+1 REFN <USER_REFERENCE_NUMBER> {0:M} +1 REFN <USER_REFERENCE_NUMBER> {0:M}
+2 TYPE <USER_REFERENCE_TYPE> {0:1} +2 TYPE <USER_REFERENCE_TYPE> {0:1}
+1 RIN <AUTOMATED_RECORD_ID> {0:1} +1 RIN <AUTOMATED_RECORD_ID> {0:1}
+1 <<CHANGE_DATE>> {0:1} +1 <<CHANGE_DATE>> {0:1}
MULTIMEDIA_RECORD:= MULTIMEDIA_RECORD:=
n @XREF:OBJE@ OBJE {1:1} n @<XREF:OBJE>@ OBJE {1:1}
+1 FORM <MULTIMEDIA_FORMAT> {1:1} +1 FORM <MULTIMEDIA_FORMAT> {1:1}
+1 TITL <DESCRIPTIVE_TITLE> {0:1} +1 TITL <DESCRIPTIVE_TITLE> {0:1}
+1 <<NOTE_STRUCTURE>> {0:M} +1 <<NOTE_STRUCTURE>> {0:M}
+1 <<SOURCE_CITATION>> {0:M}
+1 BLOB {1:1} +1 BLOB {1:1}
+2 CONT <ENCODED_MULTIMEDIA_LINE> {1:M} +2 CONT <ENCODED_MULTIMEDIA_LINE> {1:M}
+1 OBJE @<XREF:OBJE>@ /* chain to continued object */ {0:1} +1 OBJE @<XREF:OBJE>@ /* chain to continued object */ {0:1}
+1 REFN <USER_REFERENCE_NUMBER> {0:M} +1 REFN <USER_REFERENCE_NUMBER> {0:M}
+2 TYPE <USER_REFERENCE_TYPE> {0:1} +2 TYPE <USER_REFERENCE_TYPE> {0:1}
+1 RIN <AUTOMATED_RECORD_ID> {0:1} +1 RIN <AUTOMATED_RECORD_ID> {0:1}
+1 <<CHANGE_DATE>> {0:1} +1 <<CHANGE_DATE>> {0:1}
NOTE_RECORD:= NOTE_RECORD:=
n @<XREF:NOTE>@ NOTE <SUBMITTER_TEXT> {1:1} n @<XREF:NOTE>@ NOTE <SUBMITTER_TEXT> {1:1}
+1 [ CONC | CONT] <SUBMITTER_TEXT> {0:M} +1 [ CONC | CONT] <SUBMITTER_TEXT> {0:M}
+1 <<SOURCE_CITATION>> {0:M} +1 <<SOURCE_CITATION>> {0:M}
+1 REFN <USER_REFERENCE_NUMBER> {0:M} +1 REFN <USER_REFERENCE_NUMBER> {0:M}
+2 TYPE <USER_REFERENCE_TYPE> {0:1} +2 TYPE <USER_REFERENCE_TYPE> {0:1}
+1 RIN <AUTOMATED_RECORD_ID> {0:1} +1 RIN <AUTOMATED_RECORD_ID> {0:1}
+1 <<CHANGE_DATE>> {0:1} +1 <<CHANGE_DATE>> {0:1}
REPOSITORY_RECORD:= REPOSITORY_RECORD:=
n @<XREF:REPO>@ REPO {1:1} n @<XREF:REPO>@ REPO {1:1}
+1 NAME <NAME_OF_REPOSITORY> {0:1} +1 NAME <NAME_OF_REPOSITORY> {0:1}
+1 <<ADDRESS_STRUCTURE>> {0:1} +1 <<ADDRESS_STRUCTURE>> {0:1}
+1 <<NOTE_STRUCTURE>> {0:M} +1 <<NOTE_STRUCTURE>> {0:M}
+1 REFN <USER_REFERENCE_NUMBER> {0:M} +1 REFN <USER_REFERENCE_NUMBER> {0:M}
+2 TYPE <USER_REFERENCE_TYPE> {0:1} +2 TYPE <USER_REFERENCE_TYPE> {0:1}
+1 RIN <AUTOMATED_RECORD_ID> {0:1} +1 RIN <AUTOMATED_RECORD_ID> {0:1}
+1 <<CHANGE_DATE>> {0:1} +1 <<CHANGE_DATE>> {0:1}
SOURCE_RECORD:= SOURCE_RECORD:=
n @<XREF:SOUR>@ SOUR {1:1} n @<XREF:SOUR>@ SOUR {1:1}
+1 DATA {0:1} +1 DATA {0:1}
+2 EVEN <EVENTS_RECORDED> {0:M} +2 EVEN <EVENTS_RECORDED> {0:M}
+3 DATE <DATE_PERIOD> {0:1} +3 DATE <DATE_PERIOD> {0:1}
+3 PLAC <SOURCE_JURISDICTION_PLACE> {0:1} +3 PLAC <SOURCE_JURISDICTION_PLACE> {0:1}
+2 AGNC <RESPONSIBLE_AGENCY> {0:1} +2 AGNC <RESPONSIBLE_AGENCY> {0:1}
+2 <<NOTE_STRUCTURE>> {0:M} +2 <<NOTE_STRUCTURE>> {0:M}
+1 AUTH <SOURCE_ORIGINATOR> {0:1} +1 AUTH <SOURCE_ORIGINATOR> {0:1}
+2 [CONT|CONC] <SOURCE_ORIGINATOR> {0:M} +2 [CONT|CONC] <SOURCE_ORIGINATOR> {0:M}
+1 TITL <SOURCE_DESCRIPTIVE_TITLE> {0:1} +1 TITL <SOURCE_DESCRIPTIVE_TITLE> {0:1}
+2 [CONT|CONC] <SOURCE_DESCRIPTIVE_TITLE> {0:M} +2 [CONT|CONC] <SOURCE_DESCRIPTIVE_TITLE> {0:M}
+1 ABBR <SOURCE_FILED_BY_ENTRY> {0:1} +1 ABBR <SOURCE_FILED_BY_ENTRY> {0:1}
+1 PUBL <SOURCE_PUBLICATION_FACTS> {0:1} +1 PUBL <SOURCE_PUBLICATION_FACTS> {0:1}
+2 [CONT|CONC] <SOURCE_PUBLICATION_FACTS> {0:M} +2 [CONT|CONC] <SOURCE_PUBLICATION_FACTS> {0:M}
+1 TEXT <TEXT_FROM_SOURCE> {0:1} +1 TEXT <TEXT_FROM_SOURCE> {0:1}
+2 [CONT|CONC] <TEXT_FROM_SOURCE> {0:M} +2 [CONT|CONC] <TEXT_FROM_SOURCE> {0:M}
+1 <<SOURCE_REPOSITORY_CITATION>> {0:1} +1 <<SOURCE_REPOSITORY_CITATION>> {0:1}
+1 <<MULTIMEDIA_LINK>> {0:M} +1 <<MULTIMEDIA_LINK>> {0:M}
+1 <<NOTE_STRUCTURE>> {0:M} +1 <<NOTE_STRUCTURE>> {0:M}
+1 REFN <USER_REFERENCE_NUMBER> {0:M} +1 REFN <USER_REFERENCE_NUMBER> {0:M}
+2 TYPE <USER_REFERENCE_TYPE> {0:1} +2 TYPE <USER_REFERENCE_TYPE> {0:1}
+1 RIN <AUTOMATED_RECORD_ID> {0:1} +1 RIN <AUTOMATED_RECORD_ID> {0:1}
+1 <<CHANGE_DATE>> {0:1} +1 <<CHANGE_DATE>> {0:1}
SUBMISSION_RECORD:= SUBMISSION_RECORD:=
n @XREF:SUBN@ SUBN {1:1] n @<XREF:SUBN>@ SUBN {1:1]
+1 SUBM @XREF:SUBM@ {0:1} +1 SUBM @<XREF:SUBM>@ {0:1}
+1 FAMF <NAME_OF_FAMILY_FILE> {0:1} +1 FAMF <NAME_OF_FAMILY_FILE> {0:1}
+1 TEMP <TEMPLE_CODE> {0:1} +1 TEMP <TEMPLE_CODE> {0:1}
+1 ANCE <GENERATIONS_OF_ANCESTORS> {0:1} +1 ANCE <GENERATIONS_OF_ANCESTORS> {0:1}
+1 DESC <GENERATIONS_OF_DESCENDANTS> {0:1} +1 DESC <GENERATIONS_OF_DESCENDANTS> {0:1}
+1 ORDI <ORDINANCE_PROCESS_FLAG> {0:1} +1 ORDI <ORDINANCE_PROCESS_FLAG> {0:1}
+1 RIN <AUTOMATED_RECORD_ID> {0:1} +1 RIN <AUTOMATED_RECORD_ID> {0:1}
SUBMITTER_RECORD:= SUBMITTER_RECORD:=
n @<XREF:SUBM>@ SUBM {1:1} n @<XREF:SUBM>@ SUBM {1:1}
+1 NAME <SUBMITTER_NAME> {1:1} +1 NAME <SUBMITTER_NAME> {1:1}
+1 <<ADDRESS_STRUCTURE>> {0:1} +1 <<ADDRESS_STRUCTURE>> {0:1}
+1 <<MULTIMEDIA_LINK>> {0:M} +1 <<MULTIMEDIA_LINK>> {0:M}
+1 LANG <LANGUAGE_PREFERENCE> {0:3} +1 LANG <LANGUAGE_PREFERENCE> {0:3}
+1 RFN <SUBMITTER_REGISTERED_RFN> {0:1} +1 RFN <SUBMITTER_REGISTERED_RFN> {0:1}
+1 RIN <AUTOMATED_RECORD_ID> {0:1} +1 RIN <AUTOMATED_RECORD_ID> {0:1}
+1 <<CHANGE_DATE>> {0:1} +1 <<CHANGE_DATE>> {0:1}
ADDRESS_STRUCTURE:= ADDRESS_STRUCTURE:=
n ADDR <ADDRESS_LINE> {0:1} n ADDR <ADDRESS_LINE> {0:1}
+1 CONT <ADDRESS_LINE> {0:M} +1 CONT <ADDRESS_LINE> {0:M}
+1 ADR1 <ADDRESS_LINE1> {0:1} +1 ADR1 <ADDRESS_LINE1> {0:1}
+1 ADR2 <ADDRESS_LINE2> {0:1} +1 ADR2 <ADDRESS_LINE2> {0:1}
+1 CITY <ADDRESS_CITY> {0:1} +1 CITY <ADDRESS_CITY> {0:1}
+1 STAE <ADDRESS_STATE> {0:1} +1 STAE <ADDRESS_STATE> {0:1}
+1 POST <ADDRESS_POSTAL_CODE> {0:1} +1 POST <ADDRESS_POSTAL_CODE> {0:1}
+1 CTRY <ADDRESS_COUNTRY> {0:1} +1 CTRY <ADDRESS_COUNTRY> {0:1}
n PHON <PHONE_NUMBER> {0:3} n PHON <PHONE_NUMBER> {0:3}
ASSOCIATION_STRUCTURE:= ASSOCIATION_STRUCTURE:=
n ASSO @<XREF:INDI>@ {0:M} n ASSO @<XREF:INDI>@ {0:M}
+1 TYPE <RECORD_TYPE> {1:1}
+1 RELA <RELATION_IS_DESCRIPTOR> {1:1} +1 RELA <RELATION_IS_DESCRIPTOR> {1:1}
+1 <<NOTE_STRUCTURE>> {0:M} +1 <<NOTE_STRUCTURE>> {0:M}
+1 <<SOURCE_CITATION>> {0:M} +1 <<SOURCE_CITATION>> {0:M}
CHANGE_DATE:= CHANGE_DATE:=
n CHAN {1:1} n CHAN {1:1}
+1 DATE <CHANGE_DATE> {1:1} +1 DATE <CHANGE_DATE> {1:1}
+2 TIME <TIME_VALUE> {0:1} +2 TIME <TIME_VALUE> {0:1}
+1 <<NOTE_STRUCTURE>> {0:M} +1 <<NOTE_STRUCTURE>> {0:M}
CHILD_TO_FAMILY_LINK:= CHILD_TO_FAMILY_LINK:=
n FAMC @<XREF:FAM>@ {1:1} n FAMC @<XREF:FAM>@ {1:1}
+1 PEDI <PEDIGREE_LINKAGE_TYPE> {0:M} +1 PEDI <PEDIGREE_LINKAGE_TYPE> {0:1}
+1 <<NOTE_STRUCTURE>> {0:M} +1 <<NOTE_STRUCTURE>> {0:M}
EVENT_DETAIL:= EVENT_DETAIL:=
n TYPE <EVENT_DESCRIPTOR> {0:1} n TYPE <EVENT_DESCRIPTOR> {0:1}
n DATE <DATE_VALUE> {0:1} n DATE <DATE_VALUE> {0:1}
n <<PLACE_STRUCTURE>> {0:1} n <<PLACE_STRUCTURE>> {0:1}
n <<ADDRESS_STRUCTURE>> {0:1} n <<ADDRESS_STRUCTURE>> {0:1}
n AGE <AGE_AT_EVENT> {0:1} n AGE <AGE_AT_EVENT> {0:1}
n AGNC <RESPONSIBLE_AGENCY> {0:1} n AGNC <RESPONSIBLE_AGENCY> {0:1}
n CAUS <CAUSE_OF_EVENT> {0:1} n CAUS <CAUSE_OF_EVENT> {0:1}
n <<SOURCE_CITATION>> {0:M} n <<SOURCE_CITATION>> {0:M}
+1 <<NOTE_STRUCTURE>> {0:M}
+1 <<MULTIMEDIA_LINK>> {0:M}
n <<MULTIMEDIA_LINK>> {0:M} n <<MULTIMEDIA_LINK>> {0:M}
n <<NOTE_STRUCTURE>> {0:M} n <<NOTE_STRUCTURE>> {0:M}
FAMILY_EVENT_STRUCTURE:= FAMILY_EVENT_STRUCTURE:=
[ [
n [ ANUL | CENS | DIV | DIVF ] [Y|<NULL>] {1:1} n [ ANUL | CENS | DIV | DIVF ] [Y|<NULL>] {1:1}
+1 <<EVENT_DETAIL>> {0:1} +1 <<EVENT_DETAIL>> {0:1}
| |
n [ ENGA | MARR | MARB | MARC ] [Y|<NULL>] {1:1} n [ ENGA | MARR | MARB | MARC ] [Y|<NULL>] {1:1}
+1 <<EVENT_DETAIL>> {0:1} +1 <<EVENT_DETAIL>> {0:1}
| |
n [ MARL | MARS ] [Y|<NULL>] {1:1} n [ MARL | MARS ] [Y|<NULL>] {1:1}
+1 <<EVENT_DETAIL>> {0:1} +1 <<EVENT_DETAIL>> {0:1}
| |
n EVEN {1:1} n EVEN {1:1}
+1 <<EVENT_DETAIL>> {0:1} +1 <<EVENT_DETAIL>> {0:1}
] ]
INDIVIDUAL_ATTRIBUTE_STRUCTURE:= INDIVIDUAL_ATTRIBUTE_STRUCTURE:=
[ [
n CAST <CASTE_NAME> {1:1} n CAST <CASTE_NAME> {1:1}
+1 <<EVENT_DETAIL>> {0:1} +1 <<EVENT_DETAIL>> {0:1}
| |
n DSCR <PHYSICAL_DESCRIPTION> {1:1} n DSCR <PHYSICAL_DESCRIPTION> {1:1}
+1 <<EVENT_DETAIL>> {0:1} +1 <<EVENT_DETAIL>> {0:1}
| |
n EDUC <SCHOLASTIC_ACHIEVEMENT> {1:1} n EDUC <SCHOLASTIC_ACHIEVEMENT> {1:1}
+1 <<EVENT_DETAIL>> {0:1} +1 <<EVENT_DETAIL>> {0:1}
| |
n IDNO <NATIONAL_ID_NUMBER> {1:1} n IDNO <NATIONAL_ID_NUMBER> {1:1}
+1 <<EVENT_DETAIL>> {0:1} +1 <<EVENT_DETAIL>> {0:1}
| |
n NATI <NATIONAL_OR_TRIBAL_ORIGIN> {1:1} n NATI <NATIONAL_OR_TRIBAL_ORIGIN> {1:1}
+1 <<EVENT_DETAIL>> {0:1} +1 <<EVENT_DETAIL>> {0:1}
| |
n NCHI <COUNT_OF_CHILDREN> {1:1} n NCHI <COUNT_OF_CHILDREN> {1:1}
+1 <<EVENT_DETAIL>> {0:1} +1 <<EVENT_DETAIL>> {0:1}
| |
n NMR <COUNT_OF_MARRIAGES> {1:1} n NMR <COUNT_OF_MARRIAGES> {1:1}
+1 <<EVENT_DETAIL>> {0:1} +1 <<EVENT_DETAIL>> {0:1}
| |
n OCCU <OCCUPATION> {1:1} n OCCU <OCCUPATION> {1:1}
+1 <<EVENT_DETAIL>> {0:1} +1 <<EVENT_DETAIL>> {0:1}
| |
n PROP <POSSESSIONS> {1:1} n PROP <POSSESSIONS> {1:1}
+1 <<EVENT_DETAIL>> {0:1} +1 <<EVENT_DETAIL>> {0:1}
| |
n RELI <RELIGIOUS_AFFILIATION> {1:1} n RELI <RELIGIOUS_AFFILIATION> {1:1}
+1 <<EVENT_DETAIL>> {0:1} +1 <<EVENT_DETAIL>> {0:1}
| |
n RESI {1:1} n RESI {1:1}
+1 <<EVENT_DETAIL>> {0:1} +1 <<EVENT_DETAIL>> {0:1}
| |
n SSN <SOCIAL_SECURITY_NUMBER> {0:1} n SSN <SOCIAL_SECURITY_NUMBER> {0:1}
+1 <<EVENT_DETAIL>> {0:1} +1 <<EVENT_DETAIL>> {0:1}
| |
n TITL <NOBILITY_TYPE_TITLE> {1:1} n TITL <NOBILITY_TYPE_TITLE> {1:1}
+1 <<EVENT_DETAIL>> {0:1} +1 <<EVENT_DETAIL>> {0:1}
] ]
INDIVIDUAL_EVENT_STRUCTURE:= INDIVIDUAL_EVENT_STRUCTURE:=
[ [
n [ BIRT | CHR ] [Y|<NULL>] {1:1} n [ BIRT | CHR ] [Y|<NULL>] {1:1}
+1 <<EVENT_DETAIL>> {0:1} +1 <<EVENT_DETAIL>> {0:1}
+1 FAMC @<XREF:FAM>@ {0:1} +1 FAMC @<XREF:FAM>@ {0:1}
| |
n [ DEAT | BURI | CREM ] [Y|<NULL>] {1:1} n [ DEAT | BURI | CREM ] [Y|<NULL>] {1:1}
+1 <<EVENT_DETAIL>> {0:1} +1 <<EVENT_DETAIL>> {0:1}
| |
n ADOP [Y|<NULL>] {1:1} n ADOP [Y|<NULL>] {1:1}
+1 <<EVENT_DETAIL>> {0:1} +1 <<EVENT_DETAIL>> {0:1}
+1 FAMC @<XREF:FAM>@ {0:1} +1 FAMC @<XREF:FAM>@ {0:1}
+2 ADOP <ADOPTED_BY_WHICH_PARENT> {0:1} +2 ADOP <ADOPTED_BY_WHICH_PARENT> {0:1}
| |
n [ BAPM | BARM | BASM | BLES ] [Y|<NULL>] {1:1} n [ BAPM | BARM | BASM | BLES ] [Y|<NULL>] {1:1}
+1 <<EVENT_DETAIL>> {0:1} +1 <<EVENT_DETAIL>> {0:1}
| |
n [ CHRA | CONF | FCOM | ORDN ] [Y|<NULL>] {1:1} n [ CHRA | CONF | FCOM | ORDN ] [Y|<NULL>] {1:1}
+1 <<EVENT_DETAIL>> {0:1} +1 <<EVENT_DETAIL>> {0:1}
| |
n [ NATU | EMIG | IMMI ] [Y|<NULL>] {1:1} n [ NATU | EMIG | IMMI ] [Y|<NULL>] {1:1}
+1 <<EVENT_DETAIL>> {0:1} +1 <<EVENT_DETAIL>> {0:1}
| |
n [ CENS | PROB | WILL] [Y|<NULL>] {1:1} n [ CENS | PROB | WILL] [Y|<NULL>] {1:1}
+1 <<EVENT_DETAIL>> {0:1} +1 <<EVENT_DETAIL>> {0:1}
| |
n [ GRAD | RETI ] [Y|<NULL>] {1:1} n [ GRAD | RETI ] [Y|<NULL>] {1:1}
+1 <<EVENT_DETAIL>> {0:1} +1 <<EVENT_DETAIL>> {0:1}
| |
n EVEN {1:1} n EVEN {1:1}
+1 <<EVENT_DETAIL>> {0:1} +1 <<EVENT_DETAIL>> {0:1}
] ]
LDS_INDIVIDUAL_ORDINANCE:= LDS_INDIVIDUAL_ORDINANCE:=
[ [
n [ BAPL | CONL ] {1:1} n [ BAPL | CONL ] {1:1}
+1 STAT <LDS_BAPTISM_DATE_STATUS> {0:1} +1 STAT <LDS_BAPTISM_DATE_STATUS> {0:1}
+1 DATE <DATE_LDS_ORD> {0:1} +1 DATE <DATE_LDS_ORD> {0:1}
+1 TEMP <TEMPLE_CODE> {0:1} +1 TEMP <TEMPLE_CODE> {0:1}
+1 PLAC <PLACE_LIVING_ORDINANCE> {0:1} +1 PLAC <PLACE_LIVING_ORDINANCE> {0:1}
+1 <<SOURCE_CITATION>> {0:M} +1 <<SOURCE_CITATION>> {0:M}
+1 <<NOTE_STRUCTURE>> {0:M} +1 <<NOTE_STRUCTURE>> {0:M}
| |
n ENDL {1:1} n ENDL {1:1}
+1 STAT <LDS_ENDOWMENT_DATE_STATUS> {0:1} +1 STAT <LDS_ENDOWMENT_DATE_STATUS> {0:1}
+1 DATE <DATE_LDS_ORD> {0:1} +1 DATE <DATE_LDS_ORD> {0:1}
+1 TEMP <TEMPLE_CODE> {0:1} +1 TEMP <TEMPLE_CODE> {0:1}
+1 PLAC <PLACE_LIVING_ORDINANCE> {0:1} +1 PLAC <PLACE_LIVING_ORDINANCE> {0:1}
+1 <<SOURCE_CITATION>> {0:M} +1 <<SOURCE_CITATION>> {0:M}
+1 <<NOTE_STRUCTURE>> {0:M} +1 <<NOTE_STRUCTURE>> {0:M}
| |
n SLGC {1:1} n SLGC {1:1}
+1 STAT <LDS_CHILD_SEALING_DATE_STATUS> {0:1} +1 STAT <LDS_CHILD_SEALING_DATE_STATUS> {0:1}
+1 DATE <DATE_LDS_ORD> {0:1} +1 DATE <DATE_LDS_ORD> {0:1}
+1 TEMP <TEMPLE_CODE> {0:1} +1 TEMP <TEMPLE_CODE> {0:1}
+1 PLAC <PLACE_LIVING_ORDINANCE> {0:1} +1 PLAC <PLACE_LIVING_ORDINANCE> {0:1}
+1 FAMC @<XREF:FAM>@ {1:1} +1 FAMC @<XREF:FAM>@ {1:1}
+1 <<SOURCE_CITATION>> {0:M} +1 <<SOURCE_CITATION>> {0:M}
+1 <<NOTE_STRUCTURE>> {0:M} +1 <<NOTE_STRUCTURE>> {0:M}
] ]
LDS_SPOUSE_SEALING:= LDS_SPOUSE_SEALING:=
n SLGS {1:1} n SLGS {1:1}
+1 STAT <LDS_SPOUSE_SEALING_DATE_STATUS> {0:1} +1 STAT <LDS_SPOUSE_SEALING_DATE_STATUS> {0:1}
+1 DATE <DATE_LDS_ORD> {0:1} +1 DATE <DATE_LDS_ORD> {0:1}
+1 TEMP <TEMPLE_CODE> {0:1} +1 TEMP <TEMPLE_CODE> {0:1}
+1 PLAC <PLACE_LIVING_ORDINANCE> {0:1} +1 PLAC <PLACE_LIVING_ORDINANCE> {0:1}
+1 <<SOURCE_CITATION>> {0:M} +1 <<SOURCE_CITATION>> {0:M}
+1 <<NOTE_STRUCTURE>> {0:M} +1 <<NOTE_STRUCTURE>> {0:M}
MULTIMEDIA_LINK:= MULTIMEDIA_LINK:=
[ /* embedded form*/ [ /* embedded form*/
n OBJE @<XREF:OBJE>@ {1:1} n OBJE @<XREF:OBJE>@ {1:1}
| /* linked form*/ | /* linked form*/
n OBJE {1:1} n OBJE {1:1}
+1 FORM <MULTIMEDIA_FORMAT> {1:1} +1 FORM <MULTIMEDIA_FORMAT> {1:1}
+1 TITL <DESCRIPTIVE_TITLE> {0:1} +1 TITL <DESCRIPTIVE_TITLE> {0:1}
+1 FILE <MULTIMEDIA_FILE_REFERENCE> {1:1} +1 FILE <MULTIMEDIA_FILE_REFERENCE> {1:1}
+1 <<NOTE_STRUCTURE>> {0:M} +1 <<NOTE_STRUCTURE>> {0:M}
] ]
NOTE_STRUCTURE:= NOTE_STRUCTURE:=
[ [
n NOTE @<XREF:NOTE>@ {1:1} n NOTE @<XREF:NOTE>@ {1:1}
+1 <<SOURCE_CITATION>> {0:M} +1 SOUR @<XREF:SOUR>@ {0:M}
| |
n NOTE [SUBMITTER_TEXT> | <NULL>] {1:1} n NOTE [<SUBMITTER_TEXT> | <NULL>] {1:1}
+1 [ CONC | CONT ] <SUBMITTER_TEXT> {0:M} +1 [ CONC | CONT ] <SUBMITTER_TEXT> {0:M}
+1 <<SOURCE_CITATION>> {0:M} +1 SOUR @<XREF:SOUR>@ {0:M}
] ]
PERSONAL_NAME_STRUCTURE:= PERSONAL_NAME_STRUCTURE:=
n NAME <NAME_PERSONAL> {1:1} n NAME <NAME_PERSONAL> {1:1}
+1 NPFX <NAME_PIECE_PREFIX> {0:1} +1 NPFX <NAME_PIECE_PREFIX> {0:1}
+1 GIVN <NAME_PIECE_GIVEN> {0:1} +1 GIVN <NAME_PIECE_GIVEN> {0:1}
+1 NICK <NAME_PIECE_NICKNAME> {0:1} +1 NICK <NAME_PIECE_NICKNAME> {0:1}
+1 SPFX <NAME_PIECE_SURNAME_PREFIX {0:1} +1 SPFX <NAME_PIECE_SURNAME_PREFIX> {0:1}
+1 SURN <NAME_PIECE_SURNAME> {0:1} +1 SURN <NAME_PIECE_SURNAME> {0:1}
+1 NSFX <NAME_PIECE_SUFFIX> {0:1} +1 NSFX <NAME_PIECE_SUFFIX> {0:1}
+1 <<SOURCE_CITATION>> {0:M} +1 <<SOURCE_CITATION>> {0:M}
+2 <<NOTE_STRUCTURE>> {0:M}
+2 <<MULTIMEDIA_LINK>> {0:M}
+1 <<NOTE_STRUCTURE>> {0:M} +1 <<NOTE_STRUCTURE>> {0:M}
PLACE_STRUCTURE:= PLACE_STRUCTURE:=
n PLAC <PLACE_VALUE> {1:1} n PLAC <PLACE_VALUE> {1:1}
+1 FORM <PLACE_HIERARCHY> {0:1} +1 FORM <PLACE_HIERARCHY> {0:1}
+1 <<SOURCE_CITATION>> {0:M} +1 <<SOURCE_CITATION>> {0:M}
+1 <<NOTE_STRUCTURE>> {0:M} +1 <<NOTE_STRUCTURE>> {0:M}
SOURCE_CITATION:= SOURCE_CITATION:=
[ [
n SOUR @<XREF:SOUR>@ /* pointer to source record */ {1:1} n SOUR @<XREF:SOUR>@ /* pointer to source record */ {1:1}
+1 PAGE <WHERE_WITHIN_SOURCE> {0:1} +1 PAGE <WHERE_WITHIN_SOURCE> {0:1}
+1 EVEN <EVENT_TYPE_CITED_FROM> {0:1} +1 EVEN <EVENT_TYPE_CITED_FROM> {0:1}
+2 ROLE <ROLE_IN_EVENT> {0:1} +2 ROLE <ROLE_IN_EVENT> {0:1}
+1 DATA {0:1} +1 DATA {0:1}
+2 DATE <ENTRY_RECORDING_DATE> {0:1} +2 DATE <ENTRY_RECORDING_DATE> {0:1}
+2 TEXT <TEXT_FROM_SOURCE> {0:M} +2 TEXT <TEXT_FROM_SOURCE> {0:M}
+3 [ CONC | CONT ] <TEXT_FROM_SOURCE> {0:M} +3 [ CONC | CONT ] <TEXT_FROM_SOURCE> {0:M}
+1 QUAY <CERTAINTY_ASSESSMENT> {0:1} +1 QUAY <CERTAINTY_ASSESSMENT> {0:1}
+1 <<MULTIMEDIA_LINK>> {0:M} +1 <<MULTIMEDIA_LINK>> {0:M}
+1 <<NOTE_STRUCTURE>> {0:M} +1 <<NOTE_STRUCTURE>> {0:M}
| /* Systems not using source records */ | /* Systems not using source records */
n SOUR <SOURCE_DESCRIPTION> {1:1} n SOUR <SOURCE_DESCRIPTION> {1:1}
+1 [ CONC | CONT ] <SOURCE_DESCRIPTION> {0:M} +1 [ CONC | CONT ] <SOURCE_DESCRIPTION> {0:M}
+1 TEXT <TEXT_FROM_SOURCE> {0:M} +1 TEXT <TEXT_FROM_SOURCE> {0:M}
+2 [CONC | CONT ] <TEXT_FROM_SOURCE> {0:M} +2 [CONC | CONT ] <TEXT_FROM_SOURCE> {0:M}
+1 <<NOTE_STRUCTURE>> {0:M} +1 <<NOTE_STRUCTURE>> {0:M}
] ]
SOURCE_REPOSITORY_CITATION:= SOURCE_REPOSITORY_CITATION:=
[
n REPO @XREF:REPO@ {1:1} n REPO @<XREF:REPO>@ {1:1}
+1 <<NOTE_STRUCTURE>> {0:M} +1 <<NOTE_STRUCTURE>> {0:M}
+1 CALN <SOURCE_CALL_NUMBER> {0:M} +1 CALN <SOURCE_CALL_NUMBER> {0:M}
+2 MEDI <SOURCE_MEDIA_TYPE> {0:1} +2 MEDI <SOURCE_MEDIA_TYPE> {0:1}
SPOUSE_TO_FAMILY_LINK:= SPOUSE_TO_FAMILY_LINK:=
n FAMS @<XREF:FAM>@ {1:1} n FAMS @<XREF:FAM>@ {1:1}
+1 <<NOTE_STRUCTURE>> {0:M} +1 <<NOTE_STRUCTURE>> {0:M}

If you have any information related to this article please e-mail me or add it to the comments!


[Update 2015/01/01] Errata Sheet found

After the discovery of the differences described above and the reference to the Errata Sheet, I also e-mailed several people to help find the document. Louis Kessler got in contact with Brian Madsen who had the Errata Sheet (on paper) and scanned it, read all about it in Louis' blogpost More GEDCOM Archaeological Discoveries. The Errata Sheet (PDF itself is shown below. The Errata Sheet contains all of the big changes as highlighted in the table above!