Wiki

Case Status Kiln
Log In

Wiki

 
Tribute scrapes
  • RSS Feed

Last modified on 5/16/2016 3:53 PM by User.

Tags:

Tribute scrapes

Tribute scrapes

A tribute scrape is a single zip file containing information for all tributes on a specific site. A scrape contains the name of the deceased, his obituary, photos, condolences left by loved ones, and several other items covered in further detail below.

All text files must be encoded in UTF-8, with no BOM. For strings in JSON files, please use numeric escapes for any characters with Unicode codepoints below 0x20 or above above 0x7F (i.e. anything outside of the ‘basic latin’ block).

In strings, all codepoints outside of the Basic Multilingual Plane (that is, all characters that take more than 3 UTF-8 bytes to represent) must be stripped. This includes most emoji, and characters in the CJK Unified Ideographs block for extensions B and above.

Do not apply any time-zone adjustments for dates in tributes. All dates and times must be in the format 'Jul 5, 1930 12:00:00 AM' (as an example). 

File structure of tribute zips

At the root level of a tribute scrape zip, there is a list of JSON files with numeric filenames. Each JSON file corresponds to a single scraped tribute. The number used in the filename should be equal to the id property inside the JSON file, covered further below.

Optionally, a JSON tribute scrape may have an accompanying directory with the same name as the JSON file, minus the ‘.json’ extension. This directory contains any images scraped for the tribute.

Example file structure

  • /
    • fileid.txt
    • 5451.json
    • 5451/
      • deceased-01.jpg
      • deceased-02.jpg
    • 5200.json
    • 5201.json
    • 5201/
      • deceased-01.jpg
      • deceased-02.jpg

fileid.txt

This must contain a single random, unique UUID (e.g. ‘33ac92b7-0b52-4eed-a852-8257b6a03247’). Do not include a line separator.

Tribute JSON files

Each file consists of a single JSON object, fully describing the contents of a tribute.

All dates in this file must be formatted similar to this: 'Jul 5, 1930 12:00:00 AM'.

The properties of this object are as follows:

  • url - Mandatory - This is the URL from which the tribute was scraped.
  • id - Mandatory - This is a numeric ID that must uniquely identify this record in the scrape. The file name of the json file should match this. If the site being scraped uses numeric IDs for records, then please use that ID here; otherwise you may use an arbitrary unique integer.
  • firstName - Mandatory - This is the first name of the deceased.
  • middleName - Optional - This is the middle name of the deceased.
  • lastName - Mandatory - The last name of the deceased.
  • nickName - Optional - The nickname of the deceased.
  • dateOfBirth - Optional - The date of birth of the deceased.
  • dateOfDeath - Optional - The date of death of the deceased. While this is technically optional, please make every possible effort to populate this value.
  • obituary - Mandatory - The obituary of the deceased, or an empty string if there is no obituary.
  • gender - Optional - The sex of the deceased. ‘male’ or ‘female’. Exclude if the sex is not known.
  • unpublished - Mandatory - Always false.
  • photoFiles - Optional - A list of strings, each string holding a filename of a photo of the deceased. The corresponding files should be placed in ‘/<id>/<filename>’, in the scrape zip. For photos uploaded by visitors to the tribute, instead use photos (documented below).
  • candles - Optional - A list of candles posted by visitors to the tribute.
    • candles[n].from - Mandatory - The name of the person who posted the candle.
    • candles[n].email - Optional - The email address of the person who posted the candle.
    • candles[n].message - Mandatory - The message the left by the poster.
    • candles[n].date - Mandatory - The date the candle was posted on.
    • candles[n].privateCandle - Mandatory. Always false.
  • photos - Optional - A list of photos uploaded by visitors to the tribute.
    • photos[n].imagePath - Mandatory - The filename of the photo file in the tribute zip. The corresponding file should be placed in ‘/<id>/<filename>’ in the zip.
    • photos[n].created - Mandatory - The date the photo was uploaded.
    • photos[n].name - Mandatory - The name of the person who uploaded the photo.
    • photos[n].description - Optional - Any text accompanying the photo on the tribute.
    • photos[n].email - Optional - The email address of the visitor who uploaded the photo.
    • photos[n].privatePhoto - Mandatory - Always false.
  • condolences - Optional - A list of condolences posted by visitors to the tribute.
    • condolences[n].from - Mandatory - The name of the visitor who posted the condolence.
    • condolences[n].message - Mandatory - The message posted by the visitor.
    • condolences[n].date - Mandatory - The date the message was posted.
    • condolences[n].privateCondolence - Mandatory - Always false.
  • additionalServiceInfo - Optional - A list of services for the deceased.
    • additionalServiceInfo[n].title - Mandatory - The type of service (e.g. “Visitation”, “Graveside Service”).
    • additionalServiceInfo[n].description - Mandatory - The text describing time & date of the service, along with any other instructions.
  • tributeLinks - Optional - A list of external links to videos or other content for the deceased. Do not include links to content hosted on the site being scraped.
    • tributeLinks[n].title - Optional - A title describing the content.
    • tributeLinks[n].url - Mandatory - The URL of the content.

Example tribute JSON file

{
    "url" : "http://www.example.com/record/5451/",
    "id" : 5451,
    "firstName" : "John",
    "middleName" : "Q",
    "lastName" : "Smith",
    "nickName" : "Johnny",
    "dateOfBirth" : "Jul 5, 1930 12:00:00 AM",
    "dateOfDeath" : "Nov 19, 2015 12:00:00 AM",
    "obituary" : "Mr. Smith died on November 19th, 2015.\n\nHe was a good guy.",
    "gender": "male",
    "unpublished" : false,
    
    "photoFiles" : [
        "deceased-01.jpg"
    ],
    
    "candles": [
        {
            "from": "Jane P Smith",
            "email": "janeqsmith@example.net",
            "message": "Lighting a candle for you on our anniversary",
            "date": "Nov 19, 2015 12:00:00 AM",
            "privateCandle": false
        }
    ],
    
    "photos": [
        "imagePath": "deceased-02.jpg",
        "created": "Nov 19, 2015 12:00:00 AM",
        "name": "Jane P Smith",
        "approved": true,
        "description": "John at our wedding",
        "email": "janeqsmith@example.net",
        "privatePhoto": false
    ],
    
    "condolences": [
        {
            "from" : "Jane P. Smith",
            "message" : "John was a great husband.",
            "date" : "Feb 19, 2015 6:58:52 AM",
            "privateCondolence" : false
        },
        {
            "from" : "Jacky P. Smith",
            "message" : "John was a great dad.",
            "date" : "Nov 11, 2014 4:52:39 PM",
            "privateCondolence" : false
        }
    ],
    
    "additionalServiceInfo" : [
        {
            "title" : "Visitation",
            "description" : "The visitation will be held on July 22nd. All family members are welcome."
        },
        {
            "title" : "Service Info",
            "description" : "A service will be held on July 23rd."
        }, 
        {
            "title" : "Interment",
            "description" : "Graveside services will be held, TUESDAY, 12:30 PM at EG Cemetery."
        }
    ],
    
    "tributeLinks": [
        {
            "title": "A video of John on his birthday",
            "url": "http://www.example.org/a-video/"
        }
    ]
}

Example scrape

example-scrape.zip is an example scrape file, containing three tributes, two of which have photos.

Other content

For content unlikely to remain accessible after the user switches providers (e.g. streaming videos hosted on the site being scraped), please download the content and include it in a zip file separate from the main tribute zip, with a text file indicating what record each file corresponds to.

HTML backups

For every scraped HTML page, please create a backup file consisting of the source of the page. Places these backups in a zip file separate from the main tribute scrape zip. In the event of scrape discrepancies being noticed after the scraped site is no longer available, this will allow us access to the original content.