NZGDB Newsletter #4, 6th Sept, 2007

New Server Now Operational!

For about a week we’ve been running on our new server.  The new server is not much to look at – just a flat slab, about the thickness of a kitchen bench top, it’s the second from the top in this rack.  It’s already proving its worth.  You may have noticed that searching for people is significantly faster, and those with “Smith” ancestors can now look them up: previously all they saw was a timeout message.  Something that is very important to me but you may not have noticed is that I can run large database queries (like looking for duplicate records) without noticeably affecting other users. 

 

Having our own server will allow me to provide a number of new features that I could only dream about before.  First however I’ve got to learn how to administer it properly so that I prevent, rather than cause, unexpected outages. My apologies to anybody who was inconvenienced by a server outage.

 

If you are using a bookmark to access NZGDB, check that this points to www.nzgdb.co.nz and not the old URL, http://robertb.equote.co.nz/nzgdb.  This old URL was replaced in May, but until the server change it still pointed to the same site as www.nzgdb.co.nz, but it now points to the old site.  If you’re not sure, look at the first (= logon) page:  if it has a red warning message about data being lost, then you’re looking at the old site.

Unsolicited Electronic Messages Act 2007

In the site conditions you agreed to accept this newsletter.   Let me know if you want to be taken off the list.

Automatic detection of Duplicate Records

The system now attempts to detect duplicate records, linking them with a Duplicate(score) soft link.   For example, opening a record you may see something like this in the soft links section: -

 

The logic was designed to locate as many duplicates as economically possible without linking any non-duplicates.  The process is: -

1.                   Records with the same Family Name and Given Names are compared, and a “Raw Score” of 1 to 4 calculated by comparing Family Name, Given Names, Year of Birth, and Year of Death.   The Raw Score adds 1 if the field is the same, subtracts 1 if the field is different, and adds 0 if the field is absent.

2.                   Raw scores are also calculated for parents and grandparents (use zero if there is no corresponding record), and added to the raw score for the starting record.  This yields a number with a maximum of 28.

3.                   If the combined scores are greater than 10 (of 28), then a duplicate link is created.  The link’s score is reported as deciles, i.e., like a %age but 0-10 not 0-100.

 

This logic will miss duplicates at the extremities of trees, where there are insufficient parents and grandparents to raise the score above 10/28, and it misses duplicates where there are minor spelling differences (for example, Hannah Francis BARNES/Hannah Frances BARNES).  However every reported duplicate that I have checked has been genuine, which is more important than the fact that we’ve missed some.   The missing duplicates can easily be linked through [Synchronise]: -

Synchronising, linking, and merging trees

This feature has been available to testers for about a month, and now that many duplicates are located automatically it has been generally released. I’m still struggling to make this easy while preserving essential principles such as record ownership, so anybody using this facility is told to read the Help first, (http://www.nzgdb.co.nz/help/gdb5_help.htm), and any feedback would be very welcome.

 

To use this facility, click the button [Synchronise, Link, and Merge Trees].  This button is on the Compare page, which you reach from a Duplicate soft link (see above). 

 

When you synchronise records the system will automatically create duplicate links for corresponding ancestors, handling the “extremity issue” noted above.  Ancestors are linked even if they have different names, allowing situations such as John OLD/John OWLD to be recognized as duplicates.  It also provides tools to allow you to record duplicates when names are different.

 

Synchronise also makes it very easy to import facts from the duplicate record into your own record.  The system automatically creates a source record noting the owner/ged that fact has been copied from, as well as copying any source records that were in the source GED

Re-submitting GEDs – use the same name, and “Update”.

Many GED’s have been submitted with names like “Cairns Doole 2007.GED”, implying that these are a snapshot.  Perhaps you intend to submit updated records later as “Cairns Doole 2008.GED”.   But this will have some unintended consequences.

 

The system will keep the two submissions quite separate.  Thus if “Cairns Doole 2008.ged” contains records that are also present in “Cairns Doole 2007.ged”, then both records will be stored in NZGDB.   Many of these duplicates will be detected and linked, but not all.  At the least this will increase confusion as the search returns more records and users have to search through each of these trying to work out which record to believe.

 

You could ask us to remove the earlier GED, but this will lose any links that have been established to/from it.   If others have identified duplication and synchronized their records, or linked their tree to yours, they will have to do this all over again.

 

What you SHOULD do is to keep using the same name unless you actually want both GEDs to be stored.   Also, when uploading the GED you should use the “Update” option, not “Replace”.  “Update” will preserve the original record keys, retaining soft links and unchanged records.   Use “Replace” only for very exceptional situations such as when you completely rebuild your PC database, for example converting it from Legacy to Family Tree Maker. 

 

It is better to use a neutral name, like “Cairns Doole.GED”, than a name that implies a particular date.  Contact me or Tony if you’d like your data source renamed.

 

Also, as discussed in Newsletter #3 updating can be very efficient, as you only need to upload the changes.

Tree Views

Another new feature is the Tree View.  Click the [Tree View] button on the individual page, or check “Open in Tree View” on the search page, and you’ll see a display like this: -

 

Click any arrow and the tree view is shifted to the selected person.  Click any link and the relevant individual page is opened.  Colour coding is used to indicate “foreign” records (from another GED) that have been linked into your tree.

 

 

Regards,

Robert Barnes,

NZGDB Developer