2013/09/08

My wish list for a genealogical search engine for an archive

logo_enOne can daydream about the ideal genealogical search engine for an archive. After this, you could e-mail your suggestions and wait until the archive (or their software supplier) to see the light and get the budget, you could complain, or just let it rest. Or you just take the challenge yourself. Based on a list of wishes and a complete genealogical dataset from an archive (with over 4 million persons) I started to build such a search engine.

The result is Open Archives: a website which inspires but is also fully functional and ready to use!

I want to Google

When you think about searching the internet you think about Google. One search field which brings you a ton of information. Though this search field seems simple, it's actually a very strong instrument if you know how to use it. For example, if you want to search for Coret on the website of the Dutch National Archive (GaHetNa) and want to exclude Bob you just Google for "coret site:www.gahetna.nl -bob". The search results can be filtered on result type (Web/Images/Maps/Shopping/etc.), creation date and if you have visited the page before.

search_en(all images are links to Open Archives)

For Open Archives I wanted a search field like that. Just like Google one big input field on the start page. To show off the strength of the search function examples are shown beneath the field. Are you looking for someone named Oudshoorn who probably married someone named Lagas between 1900 and 1925 you type in the search field of Open Archives the query "oudshoorn & lagas 1900-1925".

By using filters you can narrow down the search results on source type, place, role and year.

filter_en

Other search operators include:

  • excluding names (-)
  • wildcards (*)
  • only records with scans ($)
  • phonetic search (~)

Je veux utiliser ce site aussi...
Ich möchte diese Website auch nutzen...
Ik wil deze website ook gebruiken...

Many ancestors came from abroad or emigrated to other countries, so genealogical research often gets international. For me this means a website like Open Archives has to be available in multiple languages. Although the content of the (current) records are in Dutch, the rest of the website if offered multilingual.

I want a readable website

The readability of a website is determined in a large part by font, font size, graphical elements (like icons) and the use of colours. Open Archives has chosen a clear font and is using a slightly bigger than normal font size, which is adjustable in the browser (via CTRL +/-).

The screens of tablets and smartphones are a lot smaller than those of a laptop or monitor. The number of users browsing the Internet with these devices is growing rapidly. By taking this fact into account from the start of your design, it's fairly easy to make you user-interface look good on different screen sizes.

You can also use Open Archives on a smartphone or tablet. For example, when you see the search results page on a small screen the table has less columns than on bigger screens, this helps keeping the rest readable. By adjusting the width of your browser, making it smaller and smaller, you can see the display of Open Archives adjusting automatically (this is called responsive design).

mobile_en

An old and seemingly forgotten browser feature is the fact that visited links can get another colour than non-visited links. This distinction makes it very easy for a user to navigate. Open Archives uses an orange colour for non visited links and a dark grey for visited links. This way you don't have to remember which records you already looked at and which not.

A good structure of your page also increases readability. Open Archives made the record pages more simple and clear. Usually, when archival records are shown, all the data elements are separately shown below each other. Some elements can just me concatenated to form readable 'sentences'.

So, instead of:

....
First name groom: Wilhelmus Josephus
Last name groom: Lugter
Occupation groom: merchant
Place of birth groom: Ridderkerk
....

Open Archives shows:

....
Groom
Wilhelmus Josephus Lugters, merchant, born in Ridderkerk
....

In records multiple persons play a role, you can also order the information to show off these relations. By adding graphical elements, you can see the relations in an instance.

relationsview_en

The graphical elements which 'connect the couples' have an additional function. By click on such an element a search is initiated for these two persons. Search for 2 persons is something many genealogist look for in a search engine and with these clickable elements it's only one mouse click!

I want make a nice print

A lot of genealogists make hard copies of the pages they find on a genealogical website. A website can determine how this print looks. Some parts don't have to be printed, like the website navigation and share buttons. Other parts are in fact only interesting for the printed version, like website address.

Open Archives makes sure that the printed version looks good.

I want to collect multiple records

If you are on a website of an archive, you usually don't stop after finding 1 record. Most genealogists will find multiple interesting records which have to be processed later on. So you want to collect interesting records. For this Open Archives introduced the data basket.

On every records page there's a button to add the record to the data basket. The data basket shows the titles of all collected records which link to the records page again.

There are two ways to output the data basket:

  • First, you can download the records in PDF format. This PDF document, which adheres to the PDF/A standard, can be viewed with a PDF reader or printed.
  • The records in the data basket can also be downloaded in GEDCOM format. This file contains all data about persons, relationships and sources, and adheres to the GEDCOM 5.5.1 standard. This GEDCOM file can easily be imported into a family tree program thus eliminating the manual input (less work, no risk of typing errors).

I only want to login if it really is necessary

Websites tend to place certain functionality behind a login. For certain personal activities this is necessary, but for many actions the required login is superfluous and therefore irritating.

Open Archives provides all the functionality without having to login. Searching, viewing the records or scans and even the data basket can all be used without login.

I want help with my source citations

A genealogist should have source citations with his/her data/publication, so the genealogist and readers can see where the data came from. This increases the verifiability and quality. Although source citations are important, they are often discarded. Usually it's too much work to collect all necessary data elements (if present at all) to form a source citation.

Open Archives aides the research by providing clear and consistent source citations with all records. The archival descriptions are used for this, so the complete titles of sources are made visible.

source_en

The sources are linked to the archival descriptions on the archive website, so readers can also read about the background of a source.

Of course the source citations are also included in the PDF document (a short and long version is provided) and in the GEDCOM file. With this GEDCOM file, version 5.5.1 of the GEDCOM specification is followed. So with a piece of information a source is linked, all information about the source is provided and linked to the repository (the archive). For the addresses of the archives, data is used from the ArchiefWiki (they provide this data for re-use).

I want suggestions to relevant additional data

Based on the information in the record clever suggestions for additional information can be made, within the dataset of the archive but also outside of the archive.

Let's start with the 'within the dataset' part. With a birth certificate, which shows the name of the child and names of the parents, a marriage records can be looked up, because this usually has the same name of the child (then in the role of groom or bride) and the parents. This also works the other way around, so with a marriage certificate the birth records can be looked up and shown when found. This also works for death certificates. This way, links can be provided on the records page to other relevant records.

sug_en

If the birth certificate notes that the person (is part of) a twin, than the name of the twin brother or sister will link to a smart search query which brings you to birth record of the twin brother or sister in two clicks.

Outside the walls of archives there's also a lot of interesting information which can be used to provide suggestions on the records pages. If such services provide their data/indexes as open data or they provide a search service (API), then connections can be made.

Open Archives currently has made connections with two of these 'external sources':

  • With death certificates the persons is looked up, based on name and year, in the Graftombe.nl dataset, which has information and photos about graves on cemeteries and churchyards.
  • The 'main persons' in the records are looked up in online family trees. This results in relevant links to the work of genealogist on Genealogie Online.

gensug_en

The search results page also shows results to hits in other websites. The query is done on Genealogie Online, the Stamboom Forum, the Stamboom Gids and the Historical Newspaper collection of the Dutch Royal Library.

cross_en

I want to contribute

Genealogists often have specific knowledge and experience and are willing to share this with other researchers and archives. To facilitate this Open Archives has several options to contribute to records.

First, errors can be reported. Indexing is done by people, mistakes happen. Data is processed by various systems which can result in errors. Through a simple form the user can report errors to Open Archives or the originating archive so the errors can be fixed.

Some records are part of a story. Each records page has the possibility to post comments or pictures. For this functionality an external service is used, because you do not have to make everything yourself!

disqus_en

If the record is cited in an online family tree this can be reported by the genealogist on the records page. The page in the online family tree is first checked, to see if the record is cited, after this the link to the page in the online family tree is shown on the records page to all.

ref_en

I want to know what data is available

The representation of the 'contents' of an archive website is often a textual summary or a complete inventory system. You can also visualize the contents of the searchable data set with interactive graphs.

Open Archives first displays a pie chart of the archives (currently only one). Clicking on a part of the pie chart brings a pie chart which shows all the places the archive has data about. Clicking again, now on a place, reveals a pie chart with all the source types which are available. Clicking on a source brings up a bar chart with the numbers per year, colours make a distinction in digitized and non-digitized material.

graph1_en

Another interactive display, based on the available data which Open Archives shows, is the surname frequency. By selecting a source type, place and time period, the names which were the most prevalent are shown in an bar chart.

freq_en

I want to be able to share the records

Social media make it possible to easily share information. Researchers can share the records they found on Open Archives on Facebook, Google+, Twitter, Pinterest and LinkedIn. One click makes the social network site fetch some relevant information (including thumbnail and link) about the records so you do not have to type this.

Sharing usually results in various comments. It's nice to show you genealogical discoveries in the archive to your friends, family and other followers ! An example (in Dutch) of a shared record on Facebook:

Facebook screenshot

I want an application programming interface (API)

This wish won't be on everybody's list, but it is important for a website. If you also offer the services, which you offer via your website, to developers (via an API), even more people can use these services via other websites or programs. Platforms like Twitter, Google en Facebook offer APIs, which results in a lot of useful, fun, convenient apps and websites, which also help in the growth of the website.

Open Archives offers, through the Open Archives API, various methods that can be used by other developers in their website or application.

I want more information and scans

Open Archives utilizes open data made ​​available for reuse by archives. Open Archives doesn't pay the archives for this data, conversely, archives do not have to pay Open Archives to present their data Open Archives. When an archive wants to provide their data as open data, they usually do have to pay their software supplier. Each archive will therefore make their own choice.

Conclusion

With Open Archives I want to show that, by making available data for reuse, nice and innovative initiatives can flourish. I hope that more archives will follow the example of the Dutch archive Erfgoed Leiden en omstreken (formerly known as the Regional Archives of Leiden): they made all their genealogical data available for reuse. If more archives do this, individuals, companies and associations can do beautiful, convenient and fun things to do with the archive data!

Finally, what are you wishes for a genealogical search engine for an archive?