2014/12/31

Analysing 635M lines of GEDCOM

imageThe GEDCOM parser of Genealogie Online needed a rewrite. The code base had grown out of proportion, resulting in inefficient code and cumbersome maintenance.

A big difference between the start of coding the GEDCOM parser and now is the number of GEDCOM files available: nearly 7 thousand. This gave me the opportunity to do some analysis (and more testing)!

 

Analysis of versions and character sets

First of all the headers of all these GEDCOM files were examined to get a feeling about which GEDCOM grammars and character sets were used.

GEDCOM version*

Count

5.5

6.339

(undefined)

248

5.5.1

245

v.1.0.01 Beta

12

5.3

5

4.0

2

2.0

1

4

1

5.01

1

Total

6.854

* The GEDCOM version as presented in HEAD > GEDC > VERS. I did not check if content did actually conform to the presented grammar version. I did manually check the 5.3, 4, etc. versions, on first glance they seemed just GEDCOM 5.5.

image

The fact that only 3.6% of the GEDCOM files identified itself as 5.5.1 surprised me as this is regarded as the current de-facto standard.

It must be noted that a big portion of GEDCOM files where produced by Dutch family tree programs. But, as can be seen on the Used family tree programs (click on the program name to expand statistics) page on Genealogie Online, only Legacy, MacFamilyTree, Ahnenblatt, PhpGedView and RootsMagic advertise their GEDCOM with the 5.5.1 label.

For the GEDCOM parser it was clear, support 5.5 (and 5.5rev) and 5.5.1 GEDCOM files.

Character set

Count

ANSI

4.395

UTF-8

1.269

ANSEL

692

ASCII

312

(undefined)

95

WINDOWS

27

IBMPC

26

MACINTOSH

21

IBM WINDOWS

11

UNICODE

5

windows-1251

1

Total

6.854

image

The number of files claiming to be UTF-8 is funny. This is because UTF-8 was introduced in GEDCOM 5.5.1. So 1.265 files claimed to be UTF-8 and 243 files claimed to be GEDCOM 5.5.1. This puts the low 3.6% in another perspective…

Fortunately, I could re-use code from the old GEDCOM parser to correctly handle character sets and encoding (was a solid piece of code).

Note: Tim Forsythe publishes similar stats from GigaTrees, which paints a more American picture (for example: 14.4% GEDCOM 5.5.1).

 

Analysis of actual use

The old GEDCOM parser also included support for invalid GEDCOM tags and custom GEDCOM tags. Although I wrote the article GEDCOM files which don’t adhere to the GEDCOM standard shouldn’t be allowed to be called GEDCOM, for Genealogie Online I’m more forgiving. I want to present the genealogical data of my users and don’t want to bother them to much with the fact that their family tree program isn’t producing valid GEDCOM. But, which of the invalid and custom tags to support in the new GEDCOM parser?

I decided to read all the GEDCOM files and count the tag-sequence uses. This resulted in a CSV file which looks like:

INDI-BIRT-AGE,45
INDI-BIRT-AGNC,1820
INDI-BIRT-DATE-ANC,162
INDI-BIRT-DATE-NOTE,172764
INDI-BIRT-DATE-NOTE-CONT,11311
INDI-BIRT-DATE-SOUR,39752
INDI-BIRT-DATE-SOUR-DATE,15951
INDI-BIRT-DATE-SOUR-ITEM,16825
INDI-BIRT-DATE-SOUR-PAGE,486
INDI-BIRT-DATE-SOUR-ROLE,36055
INDI-BIRT-DOCTOR,1
INDI-BIRT-EMAIL,1
INDI-BIRT-FAMC,1223
INDI-BIRT-LABL,4092
INDI-BIRT-LATI,49730
INDI-BIRT-LONG,49730
INDI-BIRT-MOON,37
INDI-BIRT-NOTE,720806

Next step in the analysis was visualisation of this file. I opted for my favourite Javascript module D3.js which provides a cool collapsible tree. The result is available to all those interested on the GEDCOM tag usage page (also downloadable and e-usable under a CC-BY license).

image

The colour of the node indicates if the tag-sequence is valid under the GEDCOM 5.5 grammar (red > 83.7%) or not (grey > 16.2%). This visualisation aspect is not completely accurate as not all GEDCOM files are version 5.5 (the actual version wasn't taken into account).

These tags trees give a good picture of usage. If a invalid of custom tag is used a lot, I would look into the implementation part of the GEDCOM parser.

For fun I also made selections for the top-10 programs used by Genealogie Online users. This way, you can see which program has more or less invalid/custom tags…

For my own reference I made tag trees for GEDCOM 5.5 (which is the “2 January 1986” version, which was hindered by the fact that «NOTE_STRUCTURE» references «SOURCE_CITATION» and vice versa, thus introducing a loop!) and GEDCOM 5.5.1.

De data used for all of these tag trees is also downloadable in CSV and JSON format under a CC-BY license.

The end result, besides nice visualizations, is a lean, more robust and complete GEDCOM parser for Genealogie Online! Users will notice a better support/presentation of sources and notes, and for some programs the use of RIN for identification of persons.

Which GEDCOM 5.5 grammar is correct?

The GEDCOM 5.5 standard is described in a PDF document prepared by the Family History Department of the The Church of Jesus Christ of Latter-day Saints dated 2 January 1996 (which in two days is 19 years ago).

When you Google for the GEDCOM 5.5 grammar you usually end up on the HTML version by Paul McBride which he himself calls “unofficial” (or you find the grammar files of Gedcom.pm by Paul Johnson). But over the years no one seemed to have noticed that the HTML version has a slightly different date “2 January 1996 [Revised 10 January 1996]” and differences in grammar!

Errata Sheet

Although the PDF document includes an Errata Sheet, it seems there are others. When you dig into the archives of Internet you can find references to an Errata Sheet dated 10 January 1996 which has been faxed to some people.
A GEDCOM 5.5 Errata Sheet dated 10 January 1996 supposedly contains corrections to pages 23, 24, 25, 26, 29, 29, 29, 33, 34, 39, 57, 79, and 85.
Unfortunately, this document has not hit the Internet yet, so we can’t say for sure that the “10 January 1996” version by McBride is based on this Errata Sheet.
Some of the differences in the GEDCOM 5.5 grammar between the “2 January 1996” and “Revised 10 January 1996” version are small (typo’s) but some are big (see the diff below)!

Big questions

I think the “Revised 10 January 1996” version - let's call this version GEDCOM 5.5rev - is used a lot, mainly because the HTML version is more accessible. But should we consider this an official version? In my opinion: no (because not an official LDS publication).

If there was an Errata Sheet dated 10 January 1996, why did the LDS didn’t publish it (in PDF form, online) and why didn’t they make a new GEDCOM version which they should have considering some changes are big?

A draft version of version 5.5.1 was only published in 2 October 1999 (see FamilySearch GEDCOM Specifications by Tamura Jones for a complete overview of specifications). This document contains a section which enumerates the differences with the previous version. But, some of the changes, compared to the “2 January 1996” version, which you can see in the “Revised 10 January 1996” version, weren’t mentioned in this section. I guess, the LDS internally were uncertain too about what was the correct GEDCOM 5.5 grammar.

GEDCOM 5.5 Grammar Diff

Below is a comparison between the Record Structures and Substructures of the Lineage-Linked Form (the Primitive elements of the Lineage-Linked Form are the same) between the “2 January 1996” and “Revised 10 January 1996” versions. I only focussed on the grammar, not the rest of the text in the specification. Orange highlighting means a small difference, yellow highlighting indicates a big difference. The table can also be downloaded in PDF format.


Lineage-Linked GEDCOM Form's grammar 5.5 Lineage-Linked GEDCOM Form's grammar 5.5
LDS/PDF version, dated 2 January 1996 McBride/HTML version, revised 10 January 1996
LINEAGE_LINKED_GEDCOM:= LINEAGE_LINKED_GEDCOM:=
0 <<HEADER>> {1:1} 0 <<HEADER>> {1:1}
0 <<SUBMISSION_RECORD>> {0:1} 0 <<SUBMISSION_RECORD>> {0:1}
0 <<RECORD>> {1:M} 0 <<RECORD>> {1:M}
0 TRLR {1:1} 0 TRLR {1:1}
HEADER:= HEADER:=
n HEAD {1:1} n HEAD {1:1}
+1 SOUR <APPROVED_SYSTEM_ID> {1:1} +1 SOUR <APPROVED_SYSTEM_ID> {1:1}
+2 VERS <VERSION_NUMBER> {0:1} +2 VERS <VERSION_NUMBER> {0:1}
+2 NAME <NAME_OF_PRODUCT> {0:1} +2 NAME <NAME_OF_PRODUCT> {0:1}
+2 CORP <NAME_OF_BUSINESS> {0:1} +2 CORP <NAME_OF_BUSINESS> {0:1}
+3 <<ADDRESS_STRUCTURE>> {0:1} +3 <<ADDRESS_STRUCTURE>> {0:1}
+2 DATA <NAME_OF_SOURCE_DATA> {0:1} +2 DATA <NAME_OF_SOURCE_DATA> {0:1}
+3 DATE <PUBLICATION_DATE> {0:1} +3 DATE <PUBLICATION_DATE> {0:1}
+3 COPR <COPYRIGHT_SOURCE_DATA> {0:1} +3 COPR <COPYRIGHT_SOURCE_DATA> {0:1}
+1 DEST <RECEIVING_SYSTEM_NAME> {0:1*} +1 DEST <RECEIVING_SYSTEM_NAME> {0:1*}
+1 DATE <TRANSMISSION_DATE> {0:1} +1 DATE <TRANSMISSION_DATE> {0:1}
+2 TIME <TIME_VALUE> {0:1} +2 TIME <TIME_VALUE> {0:1}
+1 SUBM @XREF:SUBM@ {1:1} +1 SUBM @<XREF:SUBM>@ {1:1}
+1 SUBN @XREF:SUBN@ {0:1} +1 SUBN @<XREF:SUBN>@ {0:1}
+1 FILE <FILE_NAME> {0:1} +1 FILE <FILE_NAME> {0:1}
+1 COPR <COPYRIGHT_GEDCOM_FILE> {0:1} +1 COPR <COPYRIGHT_GEDCOM_FILE> {0:1}
+1 GEDC {1:1} +1 GEDC {1:1}
+2 VERS <VERSION_NUMBER> {1:1} +2 VERS <VERSION_NUMBER> {1:1}
+2 FORM <GEDCOM_FORM> {1:1} +2 FORM <GEDCOM_FORM> {1:1}
+1 CHAR <CHARACTER_SET> {1:1} +1 CHAR <CHARACTER_SET> {1:1}
+2 VERS <VERSION_NUMBER> {0:1} +2 VERS <VERSION_NUMBER> {0:1}
+1 LANG <LANGUAGE_OF_TEXT> {0:1} +1 LANG <LANGUAGE_OF_TEXT> {0:1}
+1 PLAC {0:1} +1 PLAC {0:1}
+2 FORM <PLACE_HIERARCHY> {1:1} +2 FORM <PLACE_HIERARCHY> {1:1}
+1 NOTE <GEDCOM_CONTENT_DESCRIPTION> {0:1} +1 NOTE <GEDCOM_CONTENT_DESCRIPTION> {0:1}
+2 [CONT|CONC] <GEDCOM_CONTENT_DESCRIPTION> {0:M} +2 [CONT|CONC] <GEDCOM_CONTENT_DESCRIPTION> {0:M}
RECORD:= RECORD:=
[ [
n <<FAM_RECORD>> {1:1} n <<FAM_RECORD>> {1:1}
| |
n <<INDIVIDUAL_RECORD>> {1:1} n <<INDIVIDUAL_RECORD>> {1:1}
| |
n <<MULTIMEDIA_RECORD>> {1:M} n <<MULTIMEDIA_RECORD>> {1:M}
| |
n <<NOTE_RECORD>> {1:1} n <<NOTE_RECORD>> {1:1}
| |
n <<REPOSITORY_RECORD>> {1:1} n <<REPOSITORY_RECORD>> {1:1}
| |
n <<SOURCE_RECORD>> {1:1} n <<SOURCE_RECORD>> {1:1}
| |
n <<SUBMITTER_RECORD>> {1:1} n <<SUBMITTER_RECORD>> {1:1}
] ]
FAM_RECORD:= FAM_RECORD:=
n @<XREF:FAM>@ FAM {1:1} n @<XREF:FAM>@ FAM {1:1}
+1 <<FAMILY_EVENT_STRUCTURE>> {0:M} +1 <<FAMILY_EVENT_STRUCTURE>> {0:M}
+2 HUSB {0:1} +2 HUSB {0:1}
+3 AGE <AGE_AT_EVENT> {1:1} +3 AGE <AGE_AT_EVENT> {1:1}
+2 WIFE {0:1} +2 WIFE {0:1}
+3 AGE <AGE_AT_EVENT> {1:1} +3 AGE <AGE_AT_EVENT> {1:1}
+1 HUSB @<XREF:INDI>@ {0:1} +1 HUSB @<XREF:INDI>@ {0:1}
+1 WIFE @<XREF:INDI>@ {0:1} +1 WIFE @<XREF:INDI>@ {0:1}
+1 CHIL @<XREF:INDI>@ {0:M} +1 CHIL @<XREF:INDI>@ {0:M}
+1 NCHI <COUNT_OF_CHILDREN> {0:1} +1 NCHI <COUNT_OF_CHILDREN> {0:1}
+1 SUBM @<XREF:SUBM>@ {0:M} +1 SUBM @<XREF:SUBM>@ {0:M}
+1 <<LDS_SPOUSE_SEALING>> {0:M} +1 <<LDS_SPOUSE_SEALING>> {0:M}
+1 <<SOURCE_CITATION>> {0:M} +1 <<SOURCE_CITATION>> {0:M}
+2 <<NOTE_STRUCTURE>> {0:M}
+2 <<MULTIMEDIA_LINK>> {0:M}
+1 <<MULTIMEDIA_LINK>> {0:M} +1 <<MULTIMEDIA_LINK>> {0:M}
+1 <<NOTE_STRUCTURE>> {0:M} +1 <<NOTE_STRUCTURE>> {0:M}
+1 REFN <USER_REFERENCE_NUMBER> {0:M} +1 REFN <USER_REFERENCE_NUMBER> {0:M}
+2 TYPE <USER_REFERENCE_TYPE> {0:1} +2 TYPE <USER_REFERENCE_TYPE> {0:1}
+1 RIN <AUTOMATED_RECORD_ID> {0:1} +1 RIN <AUTOMATED_RECORD_ID> {0:1}
+1 <<CHANGE_DATE>> {0:1} +1 <<CHANGE_DATE>> {0:1}
INDIVIDUAL_RECORD:= INDIVIDUAL_RECORD:=
n @XREF:INDI@ INDI {1:1} n @<XREF:INDI>@ INDI {1:1}
+1 RESN <RESTRICTION_NOTICE> {0:1} +1 RESN <RESTRICTION_NOTICE> {0:1}
+1 <<PERSONAL_NAME_STRUCTURE>> {0:M} +1 <<PERSONAL_NAME_STRUCTURE>> {0:M}
+1 SEX <SEX_VALUE> {0:1} +1 SEX <SEX_VALUE> {0:1}
+1 <<INDIVIDUAL_EVENT_STRUCTURE>> {0:M} +1 <<INDIVIDUAL_EVENT_STRUCTURE>> {0:M}
+1 <<INDIVIDUAL_ATTRIBUTE_STRUCTURE>> {0:M} +1 <<INDIVIDUAL_ATTRIBUTE_STRUCTURE>> {0:M}
+1 <<LDS_INDIVIDUAL_ORDINANCE>> {0:M} +1 <<LDS_INDIVIDUAL_ORDINANCE>> {0:M}
+1 <<CHILD_TO_FAMILY_LINK>> {0:M} +1 <<CHILD_TO_FAMILY_LINK>> {0:M}
+1 <<SPOUSE_TO_FAMILY_LINK>> {0:M} +1 <<SPOUSE_TO_FAMILY_LINK>> {0:M}
+1 SUBM @<XREF:SUBM>@ {0:M} +1 SUBM @<XREF:SUBM>@ {0:M}
+1 <<ASSOCIATION_STRUCTURE>> {0:M} +1 <<ASSOCIATION_STRUCTURE>> {0:M}
+1 ALIA @<XREF:INDI>@ {0:M} +1 ALIA @<XREF:INDI>@ {0:M}
+1 ANCI @<XREF:SUBM>@ {0:M} +1 ANCI @<XREF:SUBM>@ {0:M}
+1 DESI @<XREF:SUBM>@ {0:M} +1 DESI @<XREF:SUBM>@ {0:M}
+1 <<SOURCE_CITATION>> {0:M} +1 <<SOURCE_CITATION>> {0:M}
+1 <<MULTIMEDIA_LINK>> {0:M} +1 <<MULTIMEDIA_LINK>> {0:M}
+1 <<NOTE_STRUCTURE>> {0:M} +1 <<NOTE_STRUCTURE>> {0:M}
+1 RFN <PERMANENT_RECORD_FILE_NUMBER> {0:1} +1 RFN <PERMANENT_RECORD_FILE_NUMBER> {0:1}
+1 AFN <ANCESTRAL_FILE_NUMBER> {0:1} +1 AFN <ANCESTRAL_FILE_NUMBER> {0:1}
+1 REFN <USER_REFERENCE_NUMBER> {0:M} +1 REFN <USER_REFERENCE_NUMBER> {0:M}
+2 TYPE <USER_REFERENCE_TYPE> {0:1} +2 TYPE <USER_REFERENCE_TYPE> {0:1}
+1 RIN <AUTOMATED_RECORD_ID> {0:1} +1 RIN <AUTOMATED_RECORD_ID> {0:1}
+1 <<CHANGE_DATE>> {0:1} +1 <<CHANGE_DATE>> {0:1}
MULTIMEDIA_RECORD:= MULTIMEDIA_RECORD:=
n @XREF:OBJE@ OBJE {1:1} n @<XREF:OBJE>@ OBJE {1:1}
+1 FORM <MULTIMEDIA_FORMAT> {1:1} +1 FORM <MULTIMEDIA_FORMAT> {1:1}
+1 TITL <DESCRIPTIVE_TITLE> {0:1} +1 TITL <DESCRIPTIVE_TITLE> {0:1}
+1 <<NOTE_STRUCTURE>> {0:M} +1 <<NOTE_STRUCTURE>> {0:M}
+1 <<SOURCE_CITATION>> {0:M}
+1 BLOB {1:1} +1 BLOB {1:1}
+2 CONT <ENCODED_MULTIMEDIA_LINE> {1:M} +2 CONT <ENCODED_MULTIMEDIA_LINE> {1:M}
+1 OBJE @<XREF:OBJE>@ /* chain to continued object */ {0:1} +1 OBJE @<XREF:OBJE>@ /* chain to continued object */ {0:1}
+1 REFN <USER_REFERENCE_NUMBER> {0:M} +1 REFN <USER_REFERENCE_NUMBER> {0:M}
+2 TYPE <USER_REFERENCE_TYPE> {0:1} +2 TYPE <USER_REFERENCE_TYPE> {0:1}
+1 RIN <AUTOMATED_RECORD_ID> {0:1} +1 RIN <AUTOMATED_RECORD_ID> {0:1}
+1 <<CHANGE_DATE>> {0:1} +1 <<CHANGE_DATE>> {0:1}
NOTE_RECORD:= NOTE_RECORD:=
n @<XREF:NOTE>@ NOTE <SUBMITTER_TEXT> {1:1} n @<XREF:NOTE>@ NOTE <SUBMITTER_TEXT> {1:1}
+1 [ CONC | CONT] <SUBMITTER_TEXT> {0:M} +1 [ CONC | CONT] <SUBMITTER_TEXT> {0:M}
+1 <<SOURCE_CITATION>> {0:M} +1 <<SOURCE_CITATION>> {0:M}
+1 REFN <USER_REFERENCE_NUMBER> {0:M} +1 REFN <USER_REFERENCE_NUMBER> {0:M}
+2 TYPE <USER_REFERENCE_TYPE> {0:1} +2 TYPE <USER_REFERENCE_TYPE> {0:1}
+1 RIN <AUTOMATED_RECORD_ID> {0:1} +1 RIN <AUTOMATED_RECORD_ID> {0:1}
+1 <<CHANGE_DATE>> {0:1} +1 <<CHANGE_DATE>> {0:1}
REPOSITORY_RECORD:= REPOSITORY_RECORD:=
n @<XREF:REPO>@ REPO {1:1} n @<XREF:REPO>@ REPO {1:1}
+1 NAME <NAME_OF_REPOSITORY> {0:1} +1 NAME <NAME_OF_REPOSITORY> {0:1}
+1 <<ADDRESS_STRUCTURE>> {0:1} +1 <<ADDRESS_STRUCTURE>> {0:1}
+1 <<NOTE_STRUCTURE>> {0:M} +1 <<NOTE_STRUCTURE>> {0:M}
+1 REFN <USER_REFERENCE_NUMBER> {0:M} +1 REFN <USER_REFERENCE_NUMBER> {0:M}
+2 TYPE <USER_REFERENCE_TYPE> {0:1} +2 TYPE <USER_REFERENCE_TYPE> {0:1}
+1 RIN <AUTOMATED_RECORD_ID> {0:1} +1 RIN <AUTOMATED_RECORD_ID> {0:1}
+1 <<CHANGE_DATE>> {0:1} +1 <<CHANGE_DATE>> {0:1}
SOURCE_RECORD:= SOURCE_RECORD:=
n @<XREF:SOUR>@ SOUR {1:1} n @<XREF:SOUR>@ SOUR {1:1}
+1 DATA {0:1} +1 DATA {0:1}
+2 EVEN <EVENTS_RECORDED> {0:M} +2 EVEN <EVENTS_RECORDED> {0:M}
+3 DATE <DATE_PERIOD> {0:1} +3 DATE <DATE_PERIOD> {0:1}
+3 PLAC <SOURCE_JURISDICTION_PLACE> {0:1} +3 PLAC <SOURCE_JURISDICTION_PLACE> {0:1}
+2 AGNC <RESPONSIBLE_AGENCY> {0:1} +2 AGNC <RESPONSIBLE_AGENCY> {0:1}
+2 <<NOTE_STRUCTURE>> {0:M} +2 <<NOTE_STRUCTURE>> {0:M}
+1 AUTH <SOURCE_ORIGINATOR> {0:1} +1 AUTH <SOURCE_ORIGINATOR> {0:1}
+2 [CONT|CONC] <SOURCE_ORIGINATOR> {0:M} +2 [CONT|CONC] <SOURCE_ORIGINATOR> {0:M}
+1 TITL <SOURCE_DESCRIPTIVE_TITLE> {0:1} +1 TITL <SOURCE_DESCRIPTIVE_TITLE> {0:1}
+2 [CONT|CONC] <SOURCE_DESCRIPTIVE_TITLE> {0:M} +2 [CONT|CONC] <SOURCE_DESCRIPTIVE_TITLE> {0:M}
+1 ABBR <SOURCE_FILED_BY_ENTRY> {0:1} +1 ABBR <SOURCE_FILED_BY_ENTRY> {0:1}
+1 PUBL <SOURCE_PUBLICATION_FACTS> {0:1} +1 PUBL <SOURCE_PUBLICATION_FACTS> {0:1}
+2 [CONT|CONC] <SOURCE_PUBLICATION_FACTS> {0:M} +2 [CONT|CONC] <SOURCE_PUBLICATION_FACTS> {0:M}
+1 TEXT <TEXT_FROM_SOURCE> {0:1} +1 TEXT <TEXT_FROM_SOURCE> {0:1}
+2 [CONT|CONC] <TEXT_FROM_SOURCE> {0:M} +2 [CONT|CONC] <TEXT_FROM_SOURCE> {0:M}
+1 <<SOURCE_REPOSITORY_CITATION>> {0:1} +1 <<SOURCE_REPOSITORY_CITATION>> {0:1}
+1 <<MULTIMEDIA_LINK>> {0:M} +1 <<MULTIMEDIA_LINK>> {0:M}
+1 <<NOTE_STRUCTURE>> {0:M} +1 <<NOTE_STRUCTURE>> {0:M}
+1 REFN <USER_REFERENCE_NUMBER> {0:M} +1 REFN <USER_REFERENCE_NUMBER> {0:M}
+2 TYPE <USER_REFERENCE_TYPE> {0:1} +2 TYPE <USER_REFERENCE_TYPE> {0:1}
+1 RIN <AUTOMATED_RECORD_ID> {0:1} +1 RIN <AUTOMATED_RECORD_ID> {0:1}
+1 <<CHANGE_DATE>> {0:1} +1 <<CHANGE_DATE>> {0:1}
SUBMISSION_RECORD:= SUBMISSION_RECORD:=
n @XREF:SUBN@ SUBN {1:1] n @<XREF:SUBN>@ SUBN {1:1]
+1 SUBM @XREF:SUBM@ {0:1} +1 SUBM @<XREF:SUBM>@ {0:1}
+1 FAMF <NAME_OF_FAMILY_FILE> {0:1} +1 FAMF <NAME_OF_FAMILY_FILE> {0:1}
+1 TEMP <TEMPLE_CODE> {0:1} +1 TEMP <TEMPLE_CODE> {0:1}
+1 ANCE <GENERATIONS_OF_ANCESTORS> {0:1} +1 ANCE <GENERATIONS_OF_ANCESTORS> {0:1}
+1 DESC <GENERATIONS_OF_DESCENDANTS> {0:1} +1 DESC <GENERATIONS_OF_DESCENDANTS> {0:1}
+1 ORDI <ORDINANCE_PROCESS_FLAG> {0:1} +1 ORDI <ORDINANCE_PROCESS_FLAG> {0:1}
+1 RIN <AUTOMATED_RECORD_ID> {0:1} +1 RIN <AUTOMATED_RECORD_ID> {0:1}
SUBMITTER_RECORD:= SUBMITTER_RECORD:=
n @<XREF:SUBM>@ SUBM {1:1} n @<XREF:SUBM>@ SUBM {1:1}
+1 NAME <SUBMITTER_NAME> {1:1} +1 NAME <SUBMITTER_NAME> {1:1}
+1 <<ADDRESS_STRUCTURE>> {0:1} +1 <<ADDRESS_STRUCTURE>> {0:1}
+1 <<MULTIMEDIA_LINK>> {0:M} +1 <<MULTIMEDIA_LINK>> {0:M}
+1 LANG <LANGUAGE_PREFERENCE> {0:3} +1 LANG <LANGUAGE_PREFERENCE> {0:3}
+1 RFN <SUBMITTER_REGISTERED_RFN> {0:1} +1 RFN <SUBMITTER_REGISTERED_RFN> {0:1}
+1 RIN <AUTOMATED_RECORD_ID> {0:1} +1 RIN <AUTOMATED_RECORD_ID> {0:1}
+1 <<CHANGE_DATE>> {0:1} +1 <<CHANGE_DATE>> {0:1}
ADDRESS_STRUCTURE:= ADDRESS_STRUCTURE:=
n ADDR <ADDRESS_LINE> {0:1} n ADDR <ADDRESS_LINE> {0:1}
+1 CONT <ADDRESS_LINE> {0:M} +1 CONT <ADDRESS_LINE> {0:M}
+1 ADR1 <ADDRESS_LINE1> {0:1} +1 ADR1 <ADDRESS_LINE1> {0:1}
+1 ADR2 <ADDRESS_LINE2> {0:1} +1 ADR2 <ADDRESS_LINE2> {0:1}
+1 CITY <ADDRESS_CITY> {0:1} +1 CITY <ADDRESS_CITY> {0:1}
+1 STAE <ADDRESS_STATE> {0:1} +1 STAE <ADDRESS_STATE> {0:1}
+1 POST <ADDRESS_POSTAL_CODE> {0:1} +1 POST <ADDRESS_POSTAL_CODE> {0:1}
+1 CTRY <ADDRESS_COUNTRY> {0:1} +1 CTRY <ADDRESS_COUNTRY> {0:1}
n PHON <PHONE_NUMBER> {0:3} n PHON <PHONE_NUMBER> {0:3}
ASSOCIATION_STRUCTURE:= ASSOCIATION_STRUCTURE:=
n ASSO @<XREF:INDI>@ {0:M} n ASSO @<XREF:INDI>@ {0:M}
+1 TYPE <RECORD_TYPE> {1:1}
+1 RELA <RELATION_IS_DESCRIPTOR> {1:1} +1 RELA <RELATION_IS_DESCRIPTOR> {1:1}
+1 <<NOTE_STRUCTURE>> {0:M} +1 <<NOTE_STRUCTURE>> {0:M}
+1 <<SOURCE_CITATION>> {0:M} +1 <<SOURCE_CITATION>> {0:M}
CHANGE_DATE:= CHANGE_DATE:=
n CHAN {1:1} n CHAN {1:1}
+1 DATE <CHANGE_DATE> {1:1} +1 DATE <CHANGE_DATE> {1:1}
+2 TIME <TIME_VALUE> {0:1} +2 TIME <TIME_VALUE> {0:1}
+1 <<NOTE_STRUCTURE>> {0:M} +1 <<NOTE_STRUCTURE>> {0:M}
CHILD_TO_FAMILY_LINK:= CHILD_TO_FAMILY_LINK:=
n FAMC @<XREF:FAM>@ {1:1} n FAMC @<XREF:FAM>@ {1:1}
+1 PEDI <PEDIGREE_LINKAGE_TYPE> {0:M} +1 PEDI <PEDIGREE_LINKAGE_TYPE> {0:1}
+1 <<NOTE_STRUCTURE>> {0:M} +1 <<NOTE_STRUCTURE>> {0:M}
EVENT_DETAIL:= EVENT_DETAIL:=
n TYPE <EVENT_DESCRIPTOR> {0:1} n TYPE <EVENT_DESCRIPTOR> {0:1}
n DATE <DATE_VALUE> {0:1} n DATE <DATE_VALUE> {0:1}
n <<PLACE_STRUCTURE>> {0:1} n <<PLACE_STRUCTURE>> {0:1}
n <<ADDRESS_STRUCTURE>> {0:1} n <<ADDRESS_STRUCTURE>> {0:1}
n AGE <AGE_AT_EVENT> {0:1} n AGE <AGE_AT_EVENT> {0:1}
n AGNC <RESPONSIBLE_AGENCY> {0:1} n AGNC <RESPONSIBLE_AGENCY> {0:1}
n CAUS <CAUSE_OF_EVENT> {0:1} n CAUS <CAUSE_OF_EVENT> {0:1}
n <<SOURCE_CITATION>> {0:M} n <<SOURCE_CITATION>> {0:M}
+1 <<NOTE_STRUCTURE>> {0:M}
+1 <<MULTIMEDIA_LINK>> {0:M}
n <<MULTIMEDIA_LINK>> {0:M} n <<MULTIMEDIA_LINK>> {0:M}
n <<NOTE_STRUCTURE>> {0:M} n <<NOTE_STRUCTURE>> {0:M}
FAMILY_EVENT_STRUCTURE:= FAMILY_EVENT_STRUCTURE:=
[ [
n [ ANUL | CENS | DIV | DIVF ] [Y|<NULL>] {1:1} n [ ANUL | CENS | DIV | DIVF ] [Y|<NULL>] {1:1}
+1 <<EVENT_DETAIL>> {0:1} +1 <<EVENT_DETAIL>> {0:1}
| |
n [ ENGA | MARR | MARB | MARC ] [Y|<NULL>] {1:1} n [ ENGA | MARR | MARB | MARC ] [Y|<NULL>] {1:1}
+1 <<EVENT_DETAIL>> {0:1} +1 <<EVENT_DETAIL>> {0:1}
| |
n [ MARL | MARS ] [Y|<NULL>] {1:1} n [ MARL | MARS ] [Y|<NULL>] {1:1}
+1 <<EVENT_DETAIL>> {0:1} +1 <<EVENT_DETAIL>> {0:1}
| |
n EVEN {1:1} n EVEN {1:1}
+1 <<EVENT_DETAIL>> {0:1} +1 <<EVENT_DETAIL>> {0:1}
] ]
INDIVIDUAL_ATTRIBUTE_STRUCTURE:= INDIVIDUAL_ATTRIBUTE_STRUCTURE:=
[ [
n CAST <CASTE_NAME> {1:1} n CAST <CASTE_NAME> {1:1}
+1 <<EVENT_DETAIL>> {0:1} +1 <<EVENT_DETAIL>> {0:1}
| |
n DSCR <PHYSICAL_DESCRIPTION> {1:1} n DSCR <PHYSICAL_DESCRIPTION> {1:1}
+1 <<EVENT_DETAIL>> {0:1} +1 <<EVENT_DETAIL>> {0:1}
| |
n EDUC <SCHOLASTIC_ACHIEVEMENT> {1:1} n EDUC <SCHOLASTIC_ACHIEVEMENT> {1:1}
+1 <<EVENT_DETAIL>> {0:1} +1 <<EVENT_DETAIL>> {0:1}
| |
n IDNO <NATIONAL_ID_NUMBER> {1:1} n IDNO <NATIONAL_ID_NUMBER> {1:1}
+1 <<EVENT_DETAIL>> {0:1} +1 <<EVENT_DETAIL>> {0:1}
| |
n NATI <NATIONAL_OR_TRIBAL_ORIGIN> {1:1} n NATI <NATIONAL_OR_TRIBAL_ORIGIN> {1:1}
+1 <<EVENT_DETAIL>> {0:1} +1 <<EVENT_DETAIL>> {0:1}
| |
n NCHI <COUNT_OF_CHILDREN> {1:1} n NCHI <COUNT_OF_CHILDREN> {1:1}
+1 <<EVENT_DETAIL>> {0:1} +1 <<EVENT_DETAIL>> {0:1}
| |
n NMR <COUNT_OF_MARRIAGES> {1:1} n NMR <COUNT_OF_MARRIAGES> {1:1}
+1 <<EVENT_DETAIL>> {0:1} +1 <<EVENT_DETAIL>> {0:1}
| |
n OCCU <OCCUPATION> {1:1} n OCCU <OCCUPATION> {1:1}
+1 <<EVENT_DETAIL>> {0:1} +1 <<EVENT_DETAIL>> {0:1}
| |
n PROP <POSSESSIONS> {1:1} n PROP <POSSESSIONS> {1:1}
+1 <<EVENT_DETAIL>> {0:1} +1 <<EVENT_DETAIL>> {0:1}
| |
n RELI <RELIGIOUS_AFFILIATION> {1:1} n RELI <RELIGIOUS_AFFILIATION> {1:1}
+1 <<EVENT_DETAIL>> {0:1} +1 <<EVENT_DETAIL>> {0:1}
| |
n RESI {1:1} n RESI {1:1}
+1 <<EVENT_DETAIL>> {0:1} +1 <<EVENT_DETAIL>> {0:1}
| |
n SSN <SOCIAL_SECURITY_NUMBER> {0:1} n SSN <SOCIAL_SECURITY_NUMBER> {0:1}
+1 <<EVENT_DETAIL>> {0:1} +1 <<EVENT_DETAIL>> {0:1}
| |
n TITL <NOBILITY_TYPE_TITLE> {1:1} n TITL <NOBILITY_TYPE_TITLE> {1:1}
+1 <<EVENT_DETAIL>> {0:1} +1 <<EVENT_DETAIL>> {0:1}
] ]
INDIVIDUAL_EVENT_STRUCTURE:= INDIVIDUAL_EVENT_STRUCTURE:=
[ [
n [ BIRT | CHR ] [Y|<NULL>] {1:1} n [ BIRT | CHR ] [Y|<NULL>] {1:1}
+1 <<EVENT_DETAIL>> {0:1} +1 <<EVENT_DETAIL>> {0:1}
+1 FAMC @<XREF:FAM>@ {0:1} +1 FAMC @<XREF:FAM>@ {0:1}
| |
n [ DEAT | BURI | CREM ] [Y|<NULL>] {1:1} n [ DEAT | BURI | CREM ] [Y|<NULL>] {1:1}
+1 <<EVENT_DETAIL>> {0:1} +1 <<EVENT_DETAIL>> {0:1}
| |
n ADOP [Y|<NULL>] {1:1} n ADOP [Y|<NULL>] {1:1}
+1 <<EVENT_DETAIL>> {0:1} +1 <<EVENT_DETAIL>> {0:1}
+1 FAMC @<XREF:FAM>@ {0:1} +1 FAMC @<XREF:FAM>@ {0:1}
+2 ADOP <ADOPTED_BY_WHICH_PARENT> {0:1} +2 ADOP <ADOPTED_BY_WHICH_PARENT> {0:1}
| |
n [ BAPM | BARM | BASM | BLES ] [Y|<NULL>] {1:1} n [ BAPM | BARM | BASM | BLES ] [Y|<NULL>] {1:1}
+1 <<EVENT_DETAIL>> {0:1} +1 <<EVENT_DETAIL>> {0:1}
| |
n [ CHRA | CONF | FCOM | ORDN ] [Y|<NULL>] {1:1} n [ CHRA | CONF | FCOM | ORDN ] [Y|<NULL>] {1:1}
+1 <<EVENT_DETAIL>> {0:1} +1 <<EVENT_DETAIL>> {0:1}
| |
n [ NATU | EMIG | IMMI ] [Y|<NULL>] {1:1} n [ NATU | EMIG | IMMI ] [Y|<NULL>] {1:1}
+1 <<EVENT_DETAIL>> {0:1} +1 <<EVENT_DETAIL>> {0:1}
| |
n [ CENS | PROB | WILL] [Y|<NULL>] {1:1} n [ CENS | PROB | WILL] [Y|<NULL>] {1:1}
+1 <<EVENT_DETAIL>> {0:1} +1 <<EVENT_DETAIL>> {0:1}
| |
n [ GRAD | RETI ] [Y|<NULL>] {1:1} n [ GRAD | RETI ] [Y|<NULL>] {1:1}
+1 <<EVENT_DETAIL>> {0:1} +1 <<EVENT_DETAIL>> {0:1}
| |
n EVEN {1:1} n EVEN {1:1}
+1 <<EVENT_DETAIL>> {0:1} +1 <<EVENT_DETAIL>> {0:1}
] ]
LDS_INDIVIDUAL_ORDINANCE:= LDS_INDIVIDUAL_ORDINANCE:=
[ [
n [ BAPL | CONL ] {1:1} n [ BAPL | CONL ] {1:1}
+1 STAT <LDS_BAPTISM_DATE_STATUS> {0:1} +1 STAT <LDS_BAPTISM_DATE_STATUS> {0:1}
+1 DATE <DATE_LDS_ORD> {0:1} +1 DATE <DATE_LDS_ORD> {0:1}
+1 TEMP <TEMPLE_CODE> {0:1} +1 TEMP <TEMPLE_CODE> {0:1}
+1 PLAC <PLACE_LIVING_ORDINANCE> {0:1} +1 PLAC <PLACE_LIVING_ORDINANCE> {0:1}
+1 <<SOURCE_CITATION>> {0:M} +1 <<SOURCE_CITATION>> {0:M}
+1 <<NOTE_STRUCTURE>> {0:M} +1 <<NOTE_STRUCTURE>> {0:M}
| |
n ENDL {1:1} n ENDL {1:1}
+1 STAT <LDS_ENDOWMENT_DATE_STATUS> {0:1} +1 STAT <LDS_ENDOWMENT_DATE_STATUS> {0:1}
+1 DATE <DATE_LDS_ORD> {0:1} +1 DATE <DATE_LDS_ORD> {0:1}
+1 TEMP <TEMPLE_CODE> {0:1} +1 TEMP <TEMPLE_CODE> {0:1}
+1 PLAC <PLACE_LIVING_ORDINANCE> {0:1} +1 PLAC <PLACE_LIVING_ORDINANCE> {0:1}
+1 <<SOURCE_CITATION>> {0:M} +1 <<SOURCE_CITATION>> {0:M}
+1 <<NOTE_STRUCTURE>> {0:M} +1 <<NOTE_STRUCTURE>> {0:M}
| |
n SLGC {1:1} n SLGC {1:1}
+1 STAT <LDS_CHILD_SEALING_DATE_STATUS> {0:1} +1 STAT <LDS_CHILD_SEALING_DATE_STATUS> {0:1}
+1 DATE <DATE_LDS_ORD> {0:1} +1 DATE <DATE_LDS_ORD> {0:1}
+1 TEMP <TEMPLE_CODE> {0:1} +1 TEMP <TEMPLE_CODE> {0:1}
+1 PLAC <PLACE_LIVING_ORDINANCE> {0:1} +1 PLAC <PLACE_LIVING_ORDINANCE> {0:1}
+1 FAMC @<XREF:FAM>@ {1:1} +1 FAMC @<XREF:FAM>@ {1:1}
+1 <<SOURCE_CITATION>> {0:M} +1 <<SOURCE_CITATION>> {0:M}
+1 <<NOTE_STRUCTURE>> {0:M} +1 <<NOTE_STRUCTURE>> {0:M}
] ]
LDS_SPOUSE_SEALING:= LDS_SPOUSE_SEALING:=
n SLGS {1:1} n SLGS {1:1}
+1 STAT <LDS_SPOUSE_SEALING_DATE_STATUS> {0:1} +1 STAT <LDS_SPOUSE_SEALING_DATE_STATUS> {0:1}
+1 DATE <DATE_LDS_ORD> {0:1} +1 DATE <DATE_LDS_ORD> {0:1}
+1 TEMP <TEMPLE_CODE> {0:1} +1 TEMP <TEMPLE_CODE> {0:1}
+1 PLAC <PLACE_LIVING_ORDINANCE> {0:1} +1 PLAC <PLACE_LIVING_ORDINANCE> {0:1}
+1 <<SOURCE_CITATION>> {0:M} +1 <<SOURCE_CITATION>> {0:M}
+1 <<NOTE_STRUCTURE>> {0:M} +1 <<NOTE_STRUCTURE>> {0:M}
MULTIMEDIA_LINK:= MULTIMEDIA_LINK:=
[ /* embedded form*/ [ /* embedded form*/
n OBJE @<XREF:OBJE>@ {1:1} n OBJE @<XREF:OBJE>@ {1:1}
| /* linked form*/ | /* linked form*/
n OBJE {1:1} n OBJE {1:1}
+1 FORM <MULTIMEDIA_FORMAT> {1:1} +1 FORM <MULTIMEDIA_FORMAT> {1:1}
+1 TITL <DESCRIPTIVE_TITLE> {0:1} +1 TITL <DESCRIPTIVE_TITLE> {0:1}
+1 FILE <MULTIMEDIA_FILE_REFERENCE> {1:1} +1 FILE <MULTIMEDIA_FILE_REFERENCE> {1:1}
+1 <<NOTE_STRUCTURE>> {0:M} +1 <<NOTE_STRUCTURE>> {0:M}
] ]
NOTE_STRUCTURE:= NOTE_STRUCTURE:=
[ [
n NOTE @<XREF:NOTE>@ {1:1} n NOTE @<XREF:NOTE>@ {1:1}
+1 <<SOURCE_CITATION>> {0:M} +1 SOUR @<XREF:SOUR>@ {0:M}
| |
n NOTE [SUBMITTER_TEXT> | <NULL>] {1:1} n NOTE [<SUBMITTER_TEXT> | <NULL>] {1:1}
+1 [ CONC | CONT ] <SUBMITTER_TEXT> {0:M} +1 [ CONC | CONT ] <SUBMITTER_TEXT> {0:M}
+1 <<SOURCE_CITATION>> {0:M} +1 SOUR @<XREF:SOUR>@ {0:M}
] ]
PERSONAL_NAME_STRUCTURE:= PERSONAL_NAME_STRUCTURE:=
n NAME <NAME_PERSONAL> {1:1} n NAME <NAME_PERSONAL> {1:1}
+1 NPFX <NAME_PIECE_PREFIX> {0:1} +1 NPFX <NAME_PIECE_PREFIX> {0:1}
+1 GIVN <NAME_PIECE_GIVEN> {0:1} +1 GIVN <NAME_PIECE_GIVEN> {0:1}
+1 NICK <NAME_PIECE_NICKNAME> {0:1} +1 NICK <NAME_PIECE_NICKNAME> {0:1}
+1 SPFX <NAME_PIECE_SURNAME_PREFIX {0:1} +1 SPFX <NAME_PIECE_SURNAME_PREFIX> {0:1}
+1 SURN <NAME_PIECE_SURNAME> {0:1} +1 SURN <NAME_PIECE_SURNAME> {0:1}
+1 NSFX <NAME_PIECE_SUFFIX> {0:1} +1 NSFX <NAME_PIECE_SUFFIX> {0:1}
+1 <<SOURCE_CITATION>> {0:M} +1 <<SOURCE_CITATION>> {0:M}
+2 <<NOTE_STRUCTURE>> {0:M}
+2 <<MULTIMEDIA_LINK>> {0:M}
+1 <<NOTE_STRUCTURE>> {0:M} +1 <<NOTE_STRUCTURE>> {0:M}
PLACE_STRUCTURE:= PLACE_STRUCTURE:=
n PLAC <PLACE_VALUE> {1:1} n PLAC <PLACE_VALUE> {1:1}
+1 FORM <PLACE_HIERARCHY> {0:1} +1 FORM <PLACE_HIERARCHY> {0:1}
+1 <<SOURCE_CITATION>> {0:M} +1 <<SOURCE_CITATION>> {0:M}
+1 <<NOTE_STRUCTURE>> {0:M} +1 <<NOTE_STRUCTURE>> {0:M}
SOURCE_CITATION:= SOURCE_CITATION:=
[ [
n SOUR @<XREF:SOUR>@ /* pointer to source record */ {1:1} n SOUR @<XREF:SOUR>@ /* pointer to source record */ {1:1}
+1 PAGE <WHERE_WITHIN_SOURCE> {0:1} +1 PAGE <WHERE_WITHIN_SOURCE> {0:1}
+1 EVEN <EVENT_TYPE_CITED_FROM> {0:1} +1 EVEN <EVENT_TYPE_CITED_FROM> {0:1}
+2 ROLE <ROLE_IN_EVENT> {0:1} +2 ROLE <ROLE_IN_EVENT> {0:1}
+1 DATA {0:1} +1 DATA {0:1}
+2 DATE <ENTRY_RECORDING_DATE> {0:1} +2 DATE <ENTRY_RECORDING_DATE> {0:1}
+2 TEXT <TEXT_FROM_SOURCE> {0:M} +2 TEXT <TEXT_FROM_SOURCE> {0:M}
+3 [ CONC | CONT ] <TEXT_FROM_SOURCE> {0:M} +3 [ CONC | CONT ] <TEXT_FROM_SOURCE> {0:M}
+1 QUAY <CERTAINTY_ASSESSMENT> {0:1} +1 QUAY <CERTAINTY_ASSESSMENT> {0:1}
+1 <<MULTIMEDIA_LINK>> {0:M} +1 <<MULTIMEDIA_LINK>> {0:M}
+1 <<NOTE_STRUCTURE>> {0:M} +1 <<NOTE_STRUCTURE>> {0:M}
| /* Systems not using source records */ | /* Systems not using source records */
n SOUR <SOURCE_DESCRIPTION> {1:1} n SOUR <SOURCE_DESCRIPTION> {1:1}
+1 [ CONC | CONT ] <SOURCE_DESCRIPTION> {0:M} +1 [ CONC | CONT ] <SOURCE_DESCRIPTION> {0:M}
+1 TEXT <TEXT_FROM_SOURCE> {0:M} +1 TEXT <TEXT_FROM_SOURCE> {0:M}
+2 [CONC | CONT ] <TEXT_FROM_SOURCE> {0:M} +2 [CONC | CONT ] <TEXT_FROM_SOURCE> {0:M}
+1 <<NOTE_STRUCTURE>> {0:M} +1 <<NOTE_STRUCTURE>> {0:M}
] ]
SOURCE_REPOSITORY_CITATION:= SOURCE_REPOSITORY_CITATION:=
[
n REPO @XREF:REPO@ {1:1} n REPO @<XREF:REPO>@ {1:1}
+1 <<NOTE_STRUCTURE>> {0:M} +1 <<NOTE_STRUCTURE>> {0:M}
+1 CALN <SOURCE_CALL_NUMBER> {0:M} +1 CALN <SOURCE_CALL_NUMBER> {0:M}
+2 MEDI <SOURCE_MEDIA_TYPE> {0:1} +2 MEDI <SOURCE_MEDIA_TYPE> {0:1}
SPOUSE_TO_FAMILY_LINK:= SPOUSE_TO_FAMILY_LINK:=
n FAMS @<XREF:FAM>@ {1:1} n FAMS @<XREF:FAM>@ {1:1}
+1 <<NOTE_STRUCTURE>> {0:M} +1 <<NOTE_STRUCTURE>> {0:M}

If you have any information related to this article please e-mail me or add it to the comments!


[Update 2015/01/01] Errata Sheet found

After the discovery of the differences described above and the reference to the Errata Sheet, I also e-mailed several people to help find the document. Louis Kessler got in contact with Brian Madsen who had the Errata Sheet (on paper) and scanned it, read all about it in Louis' blogpost More GEDCOM Archaeological Discoveries. The Errata Sheet (PDF itself is shown below. The Errata Sheet contains all of the big changes as highlighted in the table above!

2014/02/03

Links you can make from a record

imageWhen genealogical data from a record is displayed on a website of an archive, the names are usually (web)links, so you can easily and quickly search for that name. Open Archives shows that a record can be linked to many different sources of information, so the records become enriched.

Links to search actions

Open Archives too makes the names of the persons into ​​’search links’. But more powerful are the 'search links' that are displayed with couples. Searching for records on two names is a widely used and much requested feature that you can offer directly from the record, because there are usually various types of relationships between individuals shown on a record.

Below is an example of a marriage certificate. With just one mouse-click on the ‘relation bracket’ you can search for the parents of the bride or groom (in order to find the brothers and sisters of the bride or groom) or the bride/groom couple (to find their children).

imageClick on the image to view the record on Open Archives.

Often names in records are easy to identify because they are in a separate field. Sometimes there are also names in the comments of a record. Like the mention ​of a twin sister/brother. Open Archives recognizes these mentions of twins and makes these names into links as well.

imageClick on the image to view the record on Open Archives.

However, the search for the twin sister/brother is not purely a search by name. The search can be made ​​smarter by using the information from the record, such as the date and the name of the mother. In most cases such a smart search returns the twin sister/brother immediately.

Links to other documents

Information in records about the parents can be used to find more information about the person. For example, when parents are mentioned in a death certificate, often the birth and marriage certificate can also be found (because these records also mention the parents). Open Archives performs these searches on-the-fly and displays the results as links to other records.

imageClick on the image to view the record on Open Archives.

This principle of searching for related documents (and thus persons) can also be repeated several times. That’s what the ‘Links Explorer’ on Open Archives does. After clicking on the ‘Links Explorer’ icon (pictured here on the right) will open up a window and, starting from the record you were viewing, looks for related records which are then presented in a relationship network. This diagram shows parent-child relationships with a red (blood) line and marriages with an orange line.

imageClick on the image to view the record on the Open Archives,
then click the Links Explorer icon.


The information in records can also be used to query other data sources.

Links to biographies

With a name and birth date/place and/or death date/place you can search the Biographical Portal of the Netherlands to see if a biography is known for that person. Open Archives performs this search automatically. When there are one or more biographies these links are shown. The example below shows the links to biographies about Henry Constantine Cras.

imageClick on the image to view the record on Open Archives.

Links to gravestones

There are various websites that offer information about graves, like Graftombe.nl and Dutch-Cemeteries.com. These sources are queried when birth and death are shown on Open Archives. When a link is found, it’s presented below the record:

imageClick on the image to view the record on Open Archives.

Links to online family trees

The previous example also shows that Open Archives looks up the (main) person in online family trees, specifically Genealogy Online. Conversely, Genealogy Online gives hints on scans of genealogical events in archives via the Scans search service.

Link to the weather

With the combination date and place name the weather can be looked up, for the Netherlands in the historical dataset of the Royal Netherlands Meteorological Institute. If there are measurements, the weather on and around the specified data can be shown.

imageClick on the image to view the record on Open Archives,
then click on the date September 13, 1746.

Link to a map

Population registers also contain street names, which are interesting. A researcher usually likes to know where the street was. Open Archives now has knowledge about a large part of the (historic) streets of Leiden and Rijnsburg (Netherlands). With this information the street can be displayed (with a thick orange line) on a historical map.

imageClick on the image to view the record on Open Archives, then click the street name Hogewoerd.

Link to scan

If an archive doesn’t have scans of certain records, this does not mean that there are no scans. When Open Archives gets new open data from archival institutions or individuals, Open Archives will also look if scans are available elsewhere and whether they can be linked.

For example, Open Archives shows FamilySearch scans with records of the Regional History Center Vecht en Venen, scans of GaHetNa (Dutch National Archives) with records of Groene Hart Archieven and scans from Van Papier naar Digitaal with records from the Regional Archives of Alkmaar!

Show enriched information

As this article (and Open Archives) shows: a record doesn’t have to be shown just as-is.  Many records can be enriched with (links to) one or more other sources of information. This enrichment makes Open Archives a more useful research tool.

image

 

About Open Archives

Open Archives is an initiative of Bob Coret to show that open data and services push innovation. The genealogical search engine is available in English, French, German and Dutch. Follow Open Archives on Google+ or Twitter.

2013/09/08

My wish list for a genealogical search engine for an archive

logo_enOne can daydream about the ideal genealogical search engine for an archive. After this, you could e-mail your suggestions and wait until the archive (or their software supplier) to see the light and get the budget, you could complain, or just let it rest. Or you just take the challenge yourself. Based on a list of wishes and a complete genealogical dataset from an archive (with over 4 million persons) I started to build such a search engine.

The result is Open Archives: a website which inspires but is also fully functional and ready to use!

I want to Google

When you think about searching the internet you think about Google. One search field which brings you a ton of information. Though this search field seems simple, it's actually a very strong instrument if you know how to use it. For example, if you want to search for Coret on the website of the Dutch National Archive (GaHetNa) and want to exclude Bob you just Google for "coret site:www.gahetna.nl -bob". The search results can be filtered on result type (Web/Images/Maps/Shopping/etc.), creation date and if you have visited the page before.

search_en(all images are links to Open Archives)

For Open Archives I wanted a search field like that. Just like Google one big input field on the start page. To show off the strength of the search function examples are shown beneath the field. Are you looking for someone named Oudshoorn who probably married someone named Lagas between 1900 and 1925 you type in the search field of Open Archives the query "oudshoorn & lagas 1900-1925".

By using filters you can narrow down the search results on source type, place, role and year.

filter_en

Other search operators include:

  • excluding names (-)
  • wildcards (*)
  • only records with scans ($)
  • phonetic search (~)

Je veux utiliser ce site aussi...
Ich möchte diese Website auch nutzen...
Ik wil deze website ook gebruiken...

Many ancestors came from abroad or emigrated to other countries, so genealogical research often gets international. For me this means a website like Open Archives has to be available in multiple languages. Although the content of the (current) records are in Dutch, the rest of the website if offered multilingual.

I want a readable website

The readability of a website is determined in a large part by font, font size, graphical elements (like icons) and the use of colours. Open Archives has chosen a clear font and is using a slightly bigger than normal font size, which is adjustable in the browser (via CTRL +/-).

The screens of tablets and smartphones are a lot smaller than those of a laptop or monitor. The number of users browsing the Internet with these devices is growing rapidly. By taking this fact into account from the start of your design, it's fairly easy to make you user-interface look good on different screen sizes.

You can also use Open Archives on a smartphone or tablet. For example, when you see the search results page on a small screen the table has less columns than on bigger screens, this helps keeping the rest readable. By adjusting the width of your browser, making it smaller and smaller, you can see the display of Open Archives adjusting automatically (this is called responsive design).

mobile_en

An old and seemingly forgotten browser feature is the fact that visited links can get another colour than non-visited links. This distinction makes it very easy for a user to navigate. Open Archives uses an orange colour for non visited links and a dark grey for visited links. This way you don't have to remember which records you already looked at and which not.

A good structure of your page also increases readability. Open Archives made the record pages more simple and clear. Usually, when archival records are shown, all the data elements are separately shown below each other. Some elements can just me concatenated to form readable 'sentences'.

So, instead of:

....
First name groom: Wilhelmus Josephus
Last name groom: Lugter
Occupation groom: merchant
Place of birth groom: Ridderkerk
....

Open Archives shows:

....
Groom
Wilhelmus Josephus Lugters, merchant, born in Ridderkerk
....

In records multiple persons play a role, you can also order the information to show off these relations. By adding graphical elements, you can see the relations in an instance.

relationsview_en

The graphical elements which 'connect the couples' have an additional function. By click on such an element a search is initiated for these two persons. Search for 2 persons is something many genealogist look for in a search engine and with these clickable elements it's only one mouse click!

I want make a nice print

A lot of genealogists make hard copies of the pages they find on a genealogical website. A website can determine how this print looks. Some parts don't have to be printed, like the website navigation and share buttons. Other parts are in fact only interesting for the printed version, like website address.

Open Archives makes sure that the printed version looks good.

I want to collect multiple records

If you are on a website of an archive, you usually don't stop after finding 1 record. Most genealogists will find multiple interesting records which have to be processed later on. So you want to collect interesting records. For this Open Archives introduced the data basket.

On every records page there's a button to add the record to the data basket. The data basket shows the titles of all collected records which link to the records page again.

There are two ways to output the data basket:

  • First, you can download the records in PDF format. This PDF document, which adheres to the PDF/A standard, can be viewed with a PDF reader or printed.
  • The records in the data basket can also be downloaded in GEDCOM format. This file contains all data about persons, relationships and sources, and adheres to the GEDCOM 5.5.1 standard. This GEDCOM file can easily be imported into a family tree program thus eliminating the manual input (less work, no risk of typing errors).

I only want to login if it really is necessary

Websites tend to place certain functionality behind a login. For certain personal activities this is necessary, but for many actions the required login is superfluous and therefore irritating.

Open Archives provides all the functionality without having to login. Searching, viewing the records or scans and even the data basket can all be used without login.

I want help with my source citations

A genealogist should have source citations with his/her data/publication, so the genealogist and readers can see where the data came from. This increases the verifiability and quality. Although source citations are important, they are often discarded. Usually it's too much work to collect all necessary data elements (if present at all) to form a source citation.

Open Archives aides the research by providing clear and consistent source citations with all records. The archival descriptions are used for this, so the complete titles of sources are made visible.

source_en

The sources are linked to the archival descriptions on the archive website, so readers can also read about the background of a source.

Of course the source citations are also included in the PDF document (a short and long version is provided) and in the GEDCOM file. With this GEDCOM file, version 5.5.1 of the GEDCOM specification is followed. So with a piece of information a source is linked, all information about the source is provided and linked to the repository (the archive). For the addresses of the archives, data is used from the ArchiefWiki (they provide this data for re-use).

I want suggestions to relevant additional data

Based on the information in the record clever suggestions for additional information can be made, within the dataset of the archive but also outside of the archive.

Let's start with the 'within the dataset' part. With a birth certificate, which shows the name of the child and names of the parents, a marriage records can be looked up, because this usually has the same name of the child (then in the role of groom or bride) and the parents. This also works the other way around, so with a marriage certificate the birth records can be looked up and shown when found. This also works for death certificates. This way, links can be provided on the records page to other relevant records.

sug_en

If the birth certificate notes that the person (is part of) a twin, than the name of the twin brother or sister will link to a smart search query which brings you to birth record of the twin brother or sister in two clicks.

Outside the walls of archives there's also a lot of interesting information which can be used to provide suggestions on the records pages. If such services provide their data/indexes as open data or they provide a search service (API), then connections can be made.

Open Archives currently has made connections with two of these 'external sources':

  • With death certificates the persons is looked up, based on name and year, in the Graftombe.nl dataset, which has information and photos about graves on cemeteries and churchyards.
  • The 'main persons' in the records are looked up in online family trees. This results in relevant links to the work of genealogist on Genealogie Online.

gensug_en

The search results page also shows results to hits in other websites. The query is done on Genealogie Online, the Stamboom Forum, the Stamboom Gids and the Historical Newspaper collection of the Dutch Royal Library.

cross_en

I want to contribute

Genealogists often have specific knowledge and experience and are willing to share this with other researchers and archives. To facilitate this Open Archives has several options to contribute to records.

First, errors can be reported. Indexing is done by people, mistakes happen. Data is processed by various systems which can result in errors. Through a simple form the user can report errors to Open Archives or the originating archive so the errors can be fixed.

Some records are part of a story. Each records page has the possibility to post comments or pictures. For this functionality an external service is used, because you do not have to make everything yourself!

disqus_en

If the record is cited in an online family tree this can be reported by the genealogist on the records page. The page in the online family tree is first checked, to see if the record is cited, after this the link to the page in the online family tree is shown on the records page to all.

ref_en

I want to know what data is available

The representation of the 'contents' of an archive website is often a textual summary or a complete inventory system. You can also visualize the contents of the searchable data set with interactive graphs.

Open Archives first displays a pie chart of the archives (currently only one). Clicking on a part of the pie chart brings a pie chart which shows all the places the archive has data about. Clicking again, now on a place, reveals a pie chart with all the source types which are available. Clicking on a source brings up a bar chart with the numbers per year, colours make a distinction in digitized and non-digitized material.

graph1_en

Another interactive display, based on the available data which Open Archives shows, is the surname frequency. By selecting a source type, place and time period, the names which were the most prevalent are shown in an bar chart.

freq_en

I want to be able to share the records

Social media make it possible to easily share information. Researchers can share the records they found on Open Archives on Facebook, Google+, Twitter, Pinterest and LinkedIn. One click makes the social network site fetch some relevant information (including thumbnail and link) about the records so you do not have to type this.

Sharing usually results in various comments. It's nice to show you genealogical discoveries in the archive to your friends, family and other followers ! An example (in Dutch) of a shared record on Facebook:

Facebook screenshot

I want an application programming interface (API)

This wish won't be on everybody's list, but it is important for a website. If you also offer the services, which you offer via your website, to developers (via an API), even more people can use these services via other websites or programs. Platforms like Twitter, Google en Facebook offer APIs, which results in a lot of useful, fun, convenient apps and websites, which also help in the growth of the website.

Open Archives offers, through the Open Archives API, various methods that can be used by other developers in their website or application.

I want more information and scans

Open Archives utilizes open data made ​​available for reuse by archives. Open Archives doesn't pay the archives for this data, conversely, archives do not have to pay Open Archives to present their data Open Archives. When an archive wants to provide their data as open data, they usually do have to pay their software supplier. Each archive will therefore make their own choice.

Conclusion

With Open Archives I want to show that, by making available data for reuse, nice and innovative initiatives can flourish. I hope that more archives will follow the example of the Dutch archive Erfgoed Leiden en omstreken (formerly known as the Regional Archives of Leiden): they made all their genealogical data available for reuse. If more archives do this, individuals, companies and associations can do beautiful, convenient and fun things to do with the archive data!

Finally, what are you wishes for a genealogical search engine for an archive?