My wish list for a genealogical search engine for an archive

logo_enOne can daydream about the ideal genealogical search engine for an archive. After this, you could e-mail your suggestions and wait until the archive (or their software supplier) to see the light and get the budget, you could complain, or just let it rest. Or you just take the challenge yourself. Based on a list of wishes and a complete genealogical dataset from an archive (with over 4 million persons) I started to build such a search engine.

The result is Open Archives: a website which inspires but is also fully functional and ready to use!

I want to Google

When you think about searching the internet you think about Google. One search field which brings you a ton of information. Though this search field seems simple, it's actually a very strong instrument if you know how to use it. For example, if you want to search for Coret on the website of the Dutch National Archive (GaHetNa) and want to exclude Bob you just Google for "coret site:www.gahetna.nl -bob". The search results can be filtered on result type (Web/Images/Maps/Shopping/etc.), creation date and if you have visited the page before.

search_en(all images are links to Open Archives)

For Open Archives I wanted a search field like that. Just like Google one big input field on the start page. To show off the strength of the search function examples are shown beneath the field. Are you looking for someone named Oudshoorn who probably married someone named Lagas between 1900 and 1925 you type in the search field of Open Archives the query "oudshoorn & lagas 1900-1925".

By using filters you can narrow down the search results on source type, place, role and year.


Other search operators include:

  • excluding names (-)
  • wildcards (*)
  • only records with scans ($)
  • phonetic search (~)

Je veux utiliser ce site aussi...
Ich möchte diese Website auch nutzen...
Ik wil deze website ook gebruiken...

Many ancestors came from abroad or emigrated to other countries, so genealogical research often gets international. For me this means a website like Open Archives has to be available in multiple languages. Although the content of the (current) records are in Dutch, the rest of the website if offered multilingual.

I want a readable website

The readability of a website is determined in a large part by font, font size, graphical elements (like icons) and the use of colours. Open Archives has chosen a clear font and is using a slightly bigger than normal font size, which is adjustable in the browser (via CTRL +/-).

The screens of tablets and smartphones are a lot smaller than those of a laptop or monitor. The number of users browsing the Internet with these devices is growing rapidly. By taking this fact into account from the start of your design, it's fairly easy to make you user-interface look good on different screen sizes.

You can also use Open Archives on a smartphone or tablet. For example, when you see the search results page on a small screen the table has less columns than on bigger screens, this helps keeping the rest readable. By adjusting the width of your browser, making it smaller and smaller, you can see the display of Open Archives adjusting automatically (this is called responsive design).


An old and seemingly forgotten browser feature is the fact that visited links can get another colour than non-visited links. This distinction makes it very easy for a user to navigate. Open Archives uses an orange colour for non visited links and a dark grey for visited links. This way you don't have to remember which records you already looked at and which not.

A good structure of your page also increases readability. Open Archives made the record pages more simple and clear. Usually, when archival records are shown, all the data elements are separately shown below each other. Some elements can just me concatenated to form readable 'sentences'.

So, instead of:

First name groom: Wilhelmus Josephus
Last name groom: Lugter
Occupation groom: merchant
Place of birth groom: Ridderkerk

Open Archives shows:

Wilhelmus Josephus Lugters, merchant, born in Ridderkerk

In records multiple persons play a role, you can also order the information to show off these relations. By adding graphical elements, you can see the relations in an instance.


The graphical elements which 'connect the couples' have an additional function. By click on such an element a search is initiated for these two persons. Search for 2 persons is something many genealogist look for in a search engine and with these clickable elements it's only one mouse click!

I want make a nice print

A lot of genealogists make hard copies of the pages they find on a genealogical website. A website can determine how this print looks. Some parts don't have to be printed, like the website navigation and share buttons. Other parts are in fact only interesting for the printed version, like website address.

Open Archives makes sure that the printed version looks good.

I want to collect multiple records

If you are on a website of an archive, you usually don't stop after finding 1 record. Most genealogists will find multiple interesting records which have to be processed later on. So you want to collect interesting records. For this Open Archives introduced the data basket.

On every records page there's a button to add the record to the data basket. The data basket shows the titles of all collected records which link to the records page again.

There are two ways to output the data basket:

  • First, you can download the records in PDF format. This PDF document, which adheres to the PDF/A standard, can be viewed with a PDF reader or printed.
  • The records in the data basket can also be downloaded in GEDCOM format. This file contains all data about persons, relationships and sources, and adheres to the GEDCOM 5.5.1 standard. This GEDCOM file can easily be imported into a family tree program thus eliminating the manual input (less work, no risk of typing errors).

I only want to login if it really is necessary

Websites tend to place certain functionality behind a login. For certain personal activities this is necessary, but for many actions the required login is superfluous and therefore irritating.

Open Archives provides all the functionality without having to login. Searching, viewing the records or scans and even the data basket can all be used without login.

I want help with my source citations

A genealogist should have source citations with his/her data/publication, so the genealogist and readers can see where the data came from. This increases the verifiability and quality. Although source citations are important, they are often discarded. Usually it's too much work to collect all necessary data elements (if present at all) to form a source citation.

Open Archives aides the research by providing clear and consistent source citations with all records. The archival descriptions are used for this, so the complete titles of sources are made visible.


The sources are linked to the archival descriptions on the archive website, so readers can also read about the background of a source.

Of course the source citations are also included in the PDF document (a short and long version is provided) and in the GEDCOM file. With this GEDCOM file, version 5.5.1 of the GEDCOM specification is followed. So with a piece of information a source is linked, all information about the source is provided and linked to the repository (the archive). For the addresses of the archives, data is used from the ArchiefWiki (they provide this data for re-use).

I want suggestions to relevant additional data

Based on the information in the record clever suggestions for additional information can be made, within the dataset of the archive but also outside of the archive.

Let's start with the 'within the dataset' part. With a birth certificate, which shows the name of the child and names of the parents, a marriage records can be looked up, because this usually has the same name of the child (then in the role of groom or bride) and the parents. This also works the other way around, so with a marriage certificate the birth records can be looked up and shown when found. This also works for death certificates. This way, links can be provided on the records page to other relevant records.


If the birth certificate notes that the person (is part of) a twin, than the name of the twin brother or sister will link to a smart search query which brings you to birth record of the twin brother or sister in two clicks.

Outside the walls of archives there's also a lot of interesting information which can be used to provide suggestions on the records pages. If such services provide their data/indexes as open data or they provide a search service (API), then connections can be made.

Open Archives currently has made connections with two of these 'external sources':

  • With death certificates the persons is looked up, based on name and year, in the Graftombe.nl dataset, which has information and photos about graves on cemeteries and churchyards.
  • The 'main persons' in the records are looked up in online family trees. This results in relevant links to the work of genealogist on Genealogie Online.


The search results page also shows results to hits in other websites. The query is done on Genealogie Online, the Stamboom Forum, the Stamboom Gids and the Historical Newspaper collection of the Dutch Royal Library.


I want to contribute

Genealogists often have specific knowledge and experience and are willing to share this with other researchers and archives. To facilitate this Open Archives has several options to contribute to records.

First, errors can be reported. Indexing is done by people, mistakes happen. Data is processed by various systems which can result in errors. Through a simple form the user can report errors to Open Archives or the originating archive so the errors can be fixed.

Some records are part of a story. Each records page has the possibility to post comments or pictures. For this functionality an external service is used, because you do not have to make everything yourself!


If the record is cited in an online family tree this can be reported by the genealogist on the records page. The page in the online family tree is first checked, to see if the record is cited, after this the link to the page in the online family tree is shown on the records page to all.


I want to know what data is available

The representation of the 'contents' of an archive website is often a textual summary or a complete inventory system. You can also visualize the contents of the searchable data set with interactive graphs.

Open Archives first displays a pie chart of the archives (currently only one). Clicking on a part of the pie chart brings a pie chart which shows all the places the archive has data about. Clicking again, now on a place, reveals a pie chart with all the source types which are available. Clicking on a source brings up a bar chart with the numbers per year, colours make a distinction in digitized and non-digitized material.


Another interactive display, based on the available data which Open Archives shows, is the surname frequency. By selecting a source type, place and time period, the names which were the most prevalent are shown in an bar chart.


I want to be able to share the records

Social media make it possible to easily share information. Researchers can share the records they found on Open Archives on Facebook, Google+, Twitter, Pinterest and LinkedIn. One click makes the social network site fetch some relevant information (including thumbnail and link) about the records so you do not have to type this.

Sharing usually results in various comments. It's nice to show you genealogical discoveries in the archive to your friends, family and other followers ! An example (in Dutch) of a shared record on Facebook:

Facebook screenshot

I want an application programming interface (API)

This wish won't be on everybody's list, but it is important for a website. If you also offer the services, which you offer via your website, to developers (via an API), even more people can use these services via other websites or programs. Platforms like Twitter, Google en Facebook offer APIs, which results in a lot of useful, fun, convenient apps and websites, which also help in the growth of the website.

Open Archives offers, through the Open Archives API, various methods that can be used by other developers in their website or application.

I want more information and scans

Open Archives utilizes open data made ​​available for reuse by archives. Open Archives doesn't pay the archives for this data, conversely, archives do not have to pay Open Archives to present their data Open Archives. When an archive wants to provide their data as open data, they usually do have to pay their software supplier. Each archive will therefore make their own choice.


With Open Archives I want to show that, by making available data for reuse, nice and innovative initiatives can flourish. I hope that more archives will follow the example of the Dutch archive Erfgoed Leiden en omstreken (formerly known as the Regional Archives of Leiden): they made all their genealogical data available for reuse. If more archives do this, individuals, companies and associations can do beautiful, convenient and fun things to do with the archive data!

Finally, what are you wishes for a genealogical search engine for an archive?


GEDCOM files which don’t adhere to the GEDCOM standard shouldn’t be allowed to be called GEDCOM

image_thumb[4]When we buy 1 liter of milk we expect 1 liter of milk, because there are clear agreements about what 1 liter is. It’s a well documented standard. This way a producer knows how much to put in and a consumer knows how much he/she gets, no confusion. If you go abroad, a liter stays a liter. There are instruments we can use to measure the volume. If there’s less then 1 liter in a 1 liter carton/bottle of milk we protest and are angry with the producer.

If we export our genealogical data to a GEDCOM file we expect that when we import the file in another program all data is complete en correctly imported, because there are clear agreements about how to form and read a GEDCOM file. It’s a well documented standard. When we receive a GEDCOM file from abroad the same GEDCOM rules apply. If we loose information during a GEDCOM export/import, we usually don’t protest, at most we blame GEDCOM and call it insufficient.

Do you see the inconsistency? This is not only strange, it is wrong!

Not adhering to the GEDCOM standard leads to data loss

Users of family tree programs (and websites) have to demand from their software supplier that, if they say their software writes GEDCOM, this file has to adhere to the GEDCOM standard. Because, when a software supplier doesn’t work according to the GEDCOM standard, it’s pretty sure that you will lose data when you import the file in another program or website! If produced GEDCOM files don’t comply with the GEDCOM standard, then the claim by a product that GEDCOM export is possible, is deceptive. The same applies to import.

I say: GEDCOM files which don’t adhere to the GEDCOM standard shouldn’t be allowed to be called GEDCOM!

To prevent data loss software suppliers should maybe delete their incomplete/incorrect GEDCOM export function. But a good family tree program should have a GEDCOM export (and import), otherwise you won’t be able to move your data to another program or service. Developers should be encouraged to adhere to the GEDCOM standard and to clearly communicate about the GEDCOM compliancy.

Developers who don’t think the GEDCOM standard is any good: don’t support it! That’s much better than a mediocre support of GEDCOM and thereby creating false expectations about interchange ability. When certain GEDCOM tags aren’t supported: communicate is. Are invalid GEDCOM tags encountered: report this to the user. Developers who use GEDCOM extensions (which is valid within GEDCOM) should document and communicate and even promote the meaning of these extensions, otherwise programs and websites won’t be able to read these GEDCOM tags or just won’t or can’t support them and information is lost again!

An example

Say you have found a death certificate and insert this data in your family tree program. The data is exported to GEDCOM and the aforementioned source looks in GEDCOM format like this:

0 @S23@ SOUR
1 TITL Death Hendrika Jägers
2 TYPE WieWasWie
3 REF 21

1 REFN WIE30422548
2 TYPE WieWasWie
2 NOTE Archive name: Het Utrechts Archief
2 NOTE Archive: 1221-1
2 NOTE Part/Record: 240
2 NOTE Inventorynr.: 1992
2 NOTE Source type: BS Death


In the text above the red marked parts do not adhere to the GEDCOM standard. When you import this file in another program or website the red marked parts will probably be lost. The example not only has an incorrect syntax, the contents (semantics) are incorrect too.


Be careful with your genealogical data: demand that that exported GEDCOM files adhere to the GEDCOM standard, demand that valid GEDCOM files are imported complete and correctly.

Genealogical organisations which make, review or recommend genealogical software should be clear about the GEDCOM compliance of the software. If they don’t provide this insight they are causing their members/users harm! Because this means users will probably lose some of their painstakingly collected data when the transfer their data to another program or website.

Monitoring compliance with the standard GEDCOM

There are institution which check if 1 liter of milk is what it is supposed to be. Unfortunately there are no institutions which check the GEDCOM compliance and certify family tree programs. This could be a nice job for genealogical societies/federations!

You could check for yourself to see if your family tree program (or website) exports a valid GEDCOM file: upload the exported GEDCOM file to GED-inline. When errors are reported by GED-inline about the GEDCOM file report this to the supplier of the software, report it on this or your own blog, make comments about is one genealogical forums.

If the GEDCOM file is complete (is all you data present) is something the GED-inline tool can’t tell you. Also, you don’t know yet if a valid GEDCOM file is imported correctly by you program (or website). If you run into problems which are caused by not adhering to the GEDCOM standard: report it, blog about is, protest.

Protect your genealogical data!


Linking and enriching information, some examples of using open data

Genealogical research is about people, family relationships and family history. The purpose of Genealogie Online is to publish this information. Genealogists can - in a simple way – show off the results of their research to others. They also get feedback and insights. For the latter the genealogical information is linked to other data sources. This article describes the use of open data to put genealogical data in context.

Open linked data

According to the Open Data Handbook open data is:

data that can be freely used, reused and redistributed by anyone - subject only, at most, to the requirement to attribute and sharealike.

More and more organizations make information they have available as open data. Organisations such as Wikipedia and the Dutch meteorological organisation KNMI already provided open data, organizations are following these examples. Even the European Union has adopted open data! This initiative will create many new possibilities, opportunities for new websites and mobile apps.

Linked data is about relating data to each other. If you can link data sets together, the fun begins! On Genealogie Online information is linked together with other information on three elements:

  • the surname
  • the date
  • the place

About the surname

In genealogy you come across different surnames. Genealogie Online supports the genealogists on this topic via the About the surname page, see for example the About the surname Hollestelle page.

This page consists of information about the surname which is in part aggregated from all published data on Genealogie Online. There are also links to external sites with more information about that name, such as the (Dutch) Who (re)searches who? page.

Surnames can be written in different ways, especially if you look at it over the centuries. The open data source that can help with this challenge comes from the Zeeland Archives (in the person of Leo Hollestelle): a list of variants per surname. I did also include this resource in a Dutch presentation Please give me your data I gave for archivists, to indicate that these apparently simple lists can be of great values ​​to others! On Genealogie Online I use these variants-list on the About the surname page and in the search engine.

About the day

Another element for which information can be collected is the date/year. Genealogie Online already showed for a long time information from Wikipedia about the date of birth, marriage or death.  Like information about the government, royal house and other historic events. This information is now also shown on the About the day pages.

The Royal Netherlands Meteorological Institute (KNMI) provides weather data which goes back to 1701. This way, you can show what the weather was on the day ancestors married!

Recently two new sources were added to the About the day page to give an image of the juncture: art from the Rijksmuseum and old newsreels.

The Rijksmuseum offers information about their art as open data. Besides the meta-data the images can also be used. So now you can, for example, show which art was made in 1880.

image[8] Source: Rijksmuseum, painting by Willem Roelofs made in 1880

Open Images is a open media platform which offers online access to audio visual archive material to stimulate creative reuse. One of the items they offer are Polygoon newsreels. Based on the date the specific Polygoon newsreel from (about) that time can now be shown, see as example the About the day Tuesday March 4, 1941 page.

About the town

A third element which is present in genealogical data are place names. Genealogie Online makes use of an open data set of international geographical information supplied by Geonames.

As written in Genealogy and place names this dataset is used to check places names: did the genealogist write the name correctly and can it be uniquely identified. If so, Geonames provides information like longitude latitude and links to Wikipedia for more information.

This information is used on the About the town page, see for example the About the town Gouda page. Based on the identifying Geonames ID extra information can be collected (via DBpedia) about the town, like a descriptive text and photo.

File:Gouda vanuit de lucht.jpg Source: Wikipedia Commons, page Gouda

Genealogie Online tries, with the help of its users, to link the towns to archives. A nice open data source is the (Dutch) Archief Wiki. By linking the archives to the towns they have material about, genealogist can be redirected to the right archive based on the place name.

Another nice source which is shown on the About the town pages is provided by rijksmonumenten.info, which in turn get their data from the Cultural Heritage Agency of the Netherlands, Wikipedia and Flickr. This dataset can (among other things) be searched for longitude and latitude. This results in images of national monuments around that position!

Open data, new possibilities, new insights

This article gives some examples of how Genealogie Online uses open data to offer context to genealogists.

The nice thing is, that we’re only at the beginning of the open data movement. The more organisations, including archives (like trendsetter Archief Leiden), offer open data, the more opportunities. Of course you have to watch for copyright and privacy issues, and IT systems have to support it, but these are manage-able issues.

Open data can lead to more insight, new functionality, more economic activity!


Development of the pedigree-timeline

A timeline is a nice representation of events in time. A timeline can also be a useful tool since it can provide new insights because you can view the data in a different perspective. For genealogists, the timeline can be useful too! For a while now, I had the idea to combine the timeline with a pedigree chart.


A timeline is a graphical representation of a chronological sequence of events or time periods. This view has the form of a bar and has timestamps with inscriptions or captions.

You can create timelines yourself through services like TimeToast, TimeRime or Tijdbalk.nl (of which you see an image below). On these websites you have to manually enter the data yourself.


The timeline in genealogy

On Genealogie Online the timeline is used to give more insight into the life of a person. Below is an example of Willem Frederik Lamoraal Boissevain. The red rectangle depicts the life of the person, below the lifespans of grandparents, parents, brothers / sisters and children are put in the timeline. It reflects what the person experienced in terms of births and deaths and who lived in the same time.


The pedigree chart
imageA pedigree chart is a representation of all direct ancestors in the male and female lines or a person.

Although you can show birth and death dates, in both textual and graphical pedigree chart, is it difficult to see the overlap in lifetimes. This is where the idea of ​​combining the pedigree chart with the timeline hit me.

The pedigree-timeline

The pedigree-timeline shows both the relationships between child and parents and the lifespans of all the ancestors. The following image was the first sketch of a pedigree-timeline.


Although the combination is correct, it’s somewhat hard to read this representation. Because the proband is left and also the most recent time, the bars start at the death and finish  at birth. Not logical ... so let’s turn it around!

The first prototype


The first sketch was made in a drawing program, but a prototype followed (image show above). This was a working prototype of the pedigree-timeline, one you can view in the browser.

In this prototype, there are also several types of bars (not present in the first sketch). Of some ancestors you might not know the date of birth or death. This is shown with a striped beginning or end. It may also be that the pedigree-timeline was based on a living person and/or that ancestors are still alive. In that case, the "lifespan bar” ends with a triangle. Finally, the bars are coloured pink or blue to indicate the sex.

Second prototype

In a pedigree chart you can easily distinguish the generations, in the first sketch and prototype of the pedigree-timeline this was less visible, you missed it. This was fixed in a second prototype by using separate colours for each generation.


Data requirements

To create a pedigree chart you only need information about the names and the child-parent relationships. A pedigree-timeline requires more: information about birth / baptism and death / funeral. Only this lifespan information can be put as a bar in the timeline. The timeline can handle approximate dates, but you have to be able to estimate lifespans.

This is an additional challenge which became very apparent when I tried to generate the data for the pedigree-timeline.I try to generate this dataset from a GEDCOM file. If no date of birth is known we have make an educated guess to find a minimal year of birth. This can done by looking at the wedding and assume the person was at least 18 years at that data. Or by looking at the dates of birth of the children. A similar set of estimation rules has been drawn up to determine dates of death.

Based on these estimates approximate lifespan bars can be put in the pedigree-timeline. When no dates are available and can’t be guessed, then these persons are not shown in the pedigree-timeline!

The future of the pedigree-timeline

The development of a new type of genealogical graph, from idea through sketches to working prototypes has been fun, hence this article.

The technology is not yet finished completely, eg. it doesn’t work well/smoothly on tablets. But I hope that this new genealogical chart will soon be available on Genealogie Online. Then, you’ll find the pedigree-timeline next to the 'pedigree on the map' chart!


And who knows, maybe other family tree websites and programs will also include the pedigree-timeline…

This article is a translation of Ontwikkeling van de kwartiertijdbalk.


Genealogy and place names

Bron: http://upload.wikimedia.org/wikipedia/commons/thumb/f/f3/Emblem-earth.svg/220px-Emblem-earth.svg.pngGenealogical data contain a lot of place names. These place names sometimes get misspelled or noted incompletely which often means the original place can’t be uniquely identified any more. Time for some attention to the quality of place names in genealogical publications!

Unambiguously determining the place name

Let’s say one of your ancestors died in Heikant (in the Netherlands). Which archive do you have to visit to find more information about this person?
First you will lookup Heikant and be surprised, because there are about 25 places which are (or were) named Heikant, another 20 places with this name can be found in Belgium. It’s therefor good practice to also note the name of the province (or state). But, if you add the province North-Brabant to the name in this example, you’ll still won’t know which place it is, there are 22 places called Heikant in North-Brabant …


By adding the name of the province (state) and (current) municipality and maybe the longitude/latitude you can unambiguously determine the place. This not only helps yourself in your research, but also the visitors of your online publications and researchers you share you data with!

Besides noting the province (state) and municipality it’s also wise to add the country name. Because most Dutch think about Delft as the place in the Netherlands, but there’s also a place called Delft in Cottonwood County, Minnesota, USA!

Mistakes in place names

Of course spelling errors can sneak into ones genealogical research. sGravenhage in stead of ‘s-Gravenhage, or Ryswyk in stead of Rijswijk. Or topographical mistakes, like an erroneous province name as in “Woerden, South-Holland” in stead of “Woerden, Utrecht”. These kind of errors are somewhat easier to correct, if they are pointed out to the author.

Another type or error in place names occurs when the researcher add information about the place in the place name input field of his/her family tree program which isn’t a real place name, but more a location description like “Hooglandse Church in Leiden”. There should be only place names in place name fields, additional location (church, name of the fame, address) info should be put in other fields or notes!

Source of geographical names

Geonames logoGeonames is a free searchable geographical database offering a lot of information about places. This database contains about 8 million unique place names from all over the world, including synonyms, longitude/latitude and links to information on Wikipedia.

Quality checks on Genealogie Online
Genealogie Online is the biggest family tree website in the Netherlands. This website also gives suggestions to improve the quality of the published family trees. There are already numerous quality checks based on dates related to the genealogical events (see Online Genealogy Consistency Checks or The Most Common Genealogy Mistake articles by Tamura Jones). Now Genealogie Online also takes a good look at the place names related to genealogical events!

Whenever a GEDCOM is uploaded to Genealogie Online all place names are automatically matched with the geographical databases of Geonames. If a place name is recognized, a link is added to this place name in the publication to a “About the place name” page.

This page, see for example the About the place name Volendam page – a map is shown of the place, a link to Wikipedia and, if this is known, a link to the archive (on the Dutch ArchiefWiki) which holds the information about the place.
Also shown on these pages are the most common family names in that places, based on the family trees published on Genealogie Online.

Some statistics

From the 30 million place names found in genealogical publication on Genealogie Online about 80% is recognized. Of the other 20%:
  • 1/3 cannot be unambiguously determined, for example due to missing province/country information
  • 2/3 cannot be identified at all (spelling errors, topographical errors, etc.)

Using the knowledge about places

When the place name is recognized then information like longitude/latitude becomes available. With this information Genealogie Online can make more exact images of the geographical distribution of genealogical events within a publication.

Spreiding binnen de Benelux

Because the place names are now recognized by Genealogie Online they become more that just flat texts in a publication, they get meaning. This means Genealogie Online can now help genealogists to improve the quality of their data.

De place name which are not uniquely recognized by Genealogie Online are presented to the author of the genealogical publication so they can correct/amend the place names. Of course it can also be that the place doesn’t exist anymore and therefor can’t be found in Geonames, but then: just add the place to Geonames, it’ll make this wiki based database even better!