Although we tend to think of rolling up data by person, our data is a bit more complicated.
For instance, the Completer Table is counting the number of completers, which is equal to or more than the number of individuals. Below is a table of 4 completers comprised of three individuals.
There are records on the completer table that are unique to the individual, such as Cert Number and Gender. In a relational database, this "By Person" information is saved on a separate table (often all "By Person" information would be managed on one table)
The completer table deals with the issue of "more completers than individuals" by duplicating the "By Person" information where appropriate. Depending on your background, this can be thought of as a left-join or merging new variables using the individual information as the keyed table.
Joined Table for Reporting/Analyzing
It is usually best practice to maintain a linking element in each table so that it is possible to make joins and create reports. In the case outlined above, it probably makes sense to join using your institution's Student ID.
Student ID is a good choice because this number always stays the same, even if other information about the person changes (change of name, address, phone number, etc).
Another good choice would be the Social Security Number. However, SSN has quite a few restrictions, so most programs should probably elect to use the Student ID.
Notice also that the WA Cert Number is carried as an individual characteristic. This is a good linking number for State-Level data, but because it is usually given after completion it is not a good linking field for program level data.
Below is an example of the completer information linking with By Person information using the University ID.
Similar to the completer issue, we deal with multiple items in the Race and Endorsement elements. However, we handle each one differently.
The federal guidelines now make Ethnicity and Race a two part question. One is binary (Ethnicity Hispanic = Yes, No) and the other can have multiple selections (Race = Asian and Native American). For the report multiple entries in the Race element need to be deliminated by a comma (no spaces). Having this rule means that the entity receiving the data knows to look for more than one element, which will be separated by a comma (only a comma; no space).
We could handle endorsements the same as Race (put all of them in a cell, delimited by commas but no spaces), but this strategy has a couple of drawbacks:
- It is different than the strategy employed by OSPI (joining to their system for batch uploading might be more challenging).
- It can be more difficult to create visualizations (you lose the ability to pivot).
- It can be more difficult to aggregate by endorsements (instead of simple GROUP BY, you need to collect multiple queries WHERE the element Endorsements CONTAINS something).
- It can be more difficult to join additional information that matters at the endorsement level (hard to join data on endorsement code when there are multiple codes per cell).
But there are also positives:
- People with less experience tend to carry their data by person, so they are more likely to understand this format, which can mean fewer errors and less time.
Here is how joining endorsements to people might work:
|By Person Information|
| Endorsement Information|
Joined By Person and Endorsement Information
Oh no, there are more completers than individuals! This makes life a bit more complicated.
What you need is a table of completers (not individuals) joined to endorsements. OSPI thinks of its world as certificate type, then joins individual information and endorsement information to either side of those certificate types.
This concept requires an additional unique field. Social security number, cert number, and university ID are all unique fields to identify a person, now you need a unique filed that identifies a "completer". The heart of relational database software is its ability to automatically generate those unique ID numbers when needed. But the completer table is still a relatively small task, and the completer status could be done by hand if needed. Here are a couple of possibilities:
After joining Endorsement to Completers, a join of the By Person Information (using University ID) must be done:
Joined Completers and Endorsements
By Person Information
The final table joining the Completer, their Endorsements, and By Person information looks like the following:
Joined Endorsement, Completer, and Person Information
Another way to deal with the linking issue by hand is to concatenate several fields. This is a reasonable solution for programs that are dealing with smaller amounts of data, but it can be pretty error-prone when you begin look at larger sets of data.
For instance, this might create unique IDs combining UniversityId.Year.Type
Again, this is a reasonable solution for small systems, but it can quickly get unwieldy, especially if you have multiple people working with the data.