Creating a Database Example - Completer Table

Important Concept

There are more completers than individuals.

Dealing with the unit of measurement

Although we tend to think of rolling up data by person, our data is a bit more complicated. 

For instance, the Completer Table is counting the number of completers, which is equal to or more than the number of individuals. Below is a table of 4 completers comprised of three individuals.

Completers
Person  Year  Type 
Person 1     2009-2010  Teacher
Person 2  2009-2010  Teacher 
Person 3  2009-2010  Teacher 
Person 1  2010-2011  Administrator


There are records on the completer table that are unique to the individual, such as Cert Number and Gender.  In a relational database, this "By Person" information is saved on a separate table (often all "By Person" information would be managed on one table)

Individual Characteristics
Person  Gender WACertNumber
Person 1     123456A
Person 2  123456B
Person 3  123456C


The completer table deals with the issue of "more completers than individuals" by duplicating the "By Person" information where appropriate. Depending on your background, this can be thought of as a left-join or merging new variables using the individual information as the keyed table.  

Joined Table for Reporting/Analyzing
Person  Year  Type  Gender  WACertNumber
Person 1     2009-2010  Teacher 123456A 
Person 2  2009-2010  Teacher  123456B 
Person 3  2009-2010  Teacher  123456C 
Person 1  2010-2011  Administrator 123456A 


Joining Data

It is usually best practice to maintain a linking element in each table so that it is possible to make joins and create reports. In the case outlined above, it probably makes sense to join using your institution's Student ID. 

Student ID is a good choice because this number always stays the same, even if other information about the person changes (change of name, address, phone number, etc).  

Another good choice would be the Social Security Number. However, SSN has quite a few restrictions, so most programs should probably elect to use the Student ID. 

Notice also that the WA Cert Number is carried as an individual characteristic. This is a good linking number for State-Level data, but because it is usually given after completion it is not a good linking field for program level data.  

Below is an example of the completer information linking with By Person information using the University ID.

Completer
UniversityID Year  Type 
U1234     2009-2010  Teacher
U1235  2009-2010  Teacher 
U1236  2009-2010  Teacher 
U1234  2010-2011  Administrator

By Person
UniversityID Gender WACertNumber
U1234     123456A
U1235  123456B
U1236 123456C


Joined Completer and By Person
UniversityID Year  Type  Gender  WACertNumber
U1234     2009-2010  Teacher 123456A 
U1235  2009-2010  Teacher  123456B 
U1236  2009-2010  Teacher  123456C 
U1234  2010-2011  Administrator 123456A 


Dealing with multiple characteristics for one element

Similar to the completer issue, we deal with multiple items in the Race and Endorsement elements. However, we handle each one differently.

Race

The federal guidelines now make Ethnicity and Race a two part question. One is binary (Ethnicity Hispanic = Yes, No) and the other can have multiple selections (Race = Asian and Native American). For the report multiple entries in the Race element need to be deliminated by a comma (no spaces). Having this rule means that the entity receiving the data knows to look for more than one element, which will be separated by a comma (only a comma; no space).  

Person
UniversityID Gender WACertNumber  Ethnicity Race 
U1234     123456A
U1235  123456B A,I 
U1236 123456C


Endorsement 

We could handle endorsements the same as Race (put all of them in a cell, delimited by commas but no spaces), but this strategy has a couple of drawbacks:
  • It is different than the strategy employed by OSPI (joining to their system for batch uploading might be more challenging).
  • It can be more difficult to create visualizations (you lose the ability to pivot).
  • It can be more difficult to aggregate by endorsements (instead of simple GROUP BY, you need to collect multiple queries WHERE the element Endorsements CONTAINS something).
  • It can be more difficult to join additional information that matters at the endorsement level (hard to join data on endorsement code when there are multiple codes per cell).
But there are also positives:
  • People with less experience tend to carry their data by person, so they are more likely to understand this format, which can mean fewer errors and less time.

Here is how joining endorsements to people might work:

By Person Information
UniversityIDGenderWACertNumber EthnicityRace 
U1234    123456A
U1235 123456BA,I 
U1236123456C
 
 Endorsement Information
UniversityID
Endorsement
U1234 Mathematics
U1235 Biology 
U1235Chemistry 
U1236Special Education

Joined By Person and Endorsement Information
UniversityIDGenderWACertNumber EthnicityRace Endorsement
U1234    123456AMathmatics 
U1235 123456BA,I Biology 
U1235 F123456B A,I Chemistry 
U1236123456CSpecial Education 



Remember - Important Concept

There are more completers and endorsements by those completers than individuals.

Oh no, there are more completers than individuals! This makes life a bit more complicated.

What you need is a table of completers (not individuals) joined to endorsements. OSPI thinks of its world as certificate type, then joins individual information and endorsement information to either side of those certificate types. 

This concept requires an additional unique field. Social security number, cert number, and university ID are all unique fields to identify a person, now you need a unique filed that identifies a "completer". The heart of relational database software is its ability to automatically generate those unique ID numbers when needed. But the completer table is still a relatively small task, and the completer status could be done by hand if needed.  Here are a couple of possibilities:

Use the row number on the Completers table to join to the Endorsement Information table

Completers
RowUniversityIDYear Type 
U1234    2009-2010 Teacher
U1235 2009-2010 Teacher 
U1236 2009-2010 Teacher 
U1234 2010-2011 Administrator
 
Endorsement Information
Row 
Endorsement
1Mathematics
2Biology 
2Chemistry 
3Special Education
Principal 

After joining Endorsement to Completers, a join of the By Person Information (using University ID) must be done:

Joined Completers and Endorsements
RowUniversityIDYear Type Endorsement
U1234    2009-2010 TeacherMathematics 
U1235 2009-2010 Teacher Biology 
U1235 2009-2010 Teacher Chemistry
U1236 2009-2010 Teacher Special Education 
U1234 2010-2011 AdministratorPrincipal 
 
 
By Person Information
UniversityID EthnicityRace Gender WaCertNumber
U1234    123456A 
U1235 A,I 123456B
U1236123456C 

 

The final table joining the Completer, their Endorsements, and By Person information looks like the following:

Joined Endorsement, Completer, and Person Information
RowUniversityIDYear Type EndorsementEthnicity Race Gender WACertNumber
U1234    2009-2010 TeacherMathematics 123456A 
U1235 2009-2010 Teacher Biology A,I 123456B 
U1235 2009-2010 Teacher ChemistryA,I 123456B 
U1236 2009-2010 Teacher Special Education 123456C 
U1234 2010-2011 AdministratorPrincipal 123456A 
 

Another Solution - Concatenate Fields if Small Data Sets

Another way to deal with the linking issue by hand is to concatenate several fields. This is a reasonable solution for programs that are dealing with smaller amounts of data, but it can be pretty error-prone when you begin look at larger sets of data. 

For instance, this might create unique IDs combining UniversityId.Year.Type 
LinkingFieldUniversityID
U1234.2009-2010.Teacher U1234 
U1235.2009-2010.Teacher U1235 
U1236.2009-2010.Teacher U1236 
U1234.2010-2011.Administrator U1234 

Again, this is a reasonable solution for small systems, but it can quickly get unwieldy, especially if you have multiple people working with the data.