Data Analyses Update 7-1-12

Thus far we've explored two data sets. The first was a set to 4,040 OpenMRS registration messages that were *not* maternal child health cases; they were from OpenMRS systems that are currently operational. Others (Liz?) might be able to say more about the specifics of this data. The second data set is the "Ubehdehe" database and the subset we analyzed contained 2,206,795 records. Again, others may have more specifics about this data.

I've attached an Excel file here: Rwanda_Field_Metrics_Combined.xls with multiple worksheets characterizing the data. I'd be happy to walk you through that file if it would be helpful.

Based on my review of the two datasets, there appear to be eight fields in common between the two datasets (see the "Field mapping" work sheet in the attached Excel file):

  1. family_name
  2. given_name
  3. gender
  4. birthdate
  5. Umudugudu (Village?)
  6. Cell
  7. Sector
  8. District

Fields 5-8 above are likely correlated, so we likely can't treat them as completely independent fields. In addition to the above fields, four ID's are under consideration for use in the client registry:

  1. NID - Rwanda national ID
  2. Mutuelle_ID - RWanda insurance ID
  3. Rama_ID - (??) I don't know what this is.
  4. InternalOpenMRS_ID - generated by the local OpenMRS instance

I'm not certain on the availability/completeness of the above identifiers (we're analyzing the NID field in the Ubedehe dataset), so don't know which ID's will meaningfully contribute to matching.