ehrQL beyond EHR data
ehrQL is OpenSAFELY’s query language, originally designed for querying data from electronic health records.
This blog post describes how we’ve adapted ehrQL to support querying education data from the TED dataset. We have used ehrQL, with these adaptations, to replicate some of the research (using dummy data) in the TIDE paper Can we estimate teacher impact from school assessment data?.
The TED dataset
The TED dataset contains data about students and their assessment results, called attainments.
Data from schools is provided to us in six related tables:

ehrQL is designed to make it easy to work with data about events: things like observations, diagnoses, prescriptions – and assessment results. As such, we present a simplified schema to researchers:

This is called denormalisation: the Result table now contains repeated information. In this case, we think it makes it easier for researchers to work with the data.
We make one simplification to the data model: a result only links to a teacher if the result is for a class that was taught by a single teacher. This matches how the data was used in the TIDE pilot study, but we may need to revisit this if researchers are interested in classes taught by multiple teachers.
Changes to ehrQL
Thanks to ehrQL’s layered architecture (described in this blog post), the changes we’ve made to support querying the TED data with ehrQL were fairly simple and self-contained. You can see the new code on GitHub in tables/ted.py and backends/ted.py.
We’ve also made a small change to the language itself.
In the version of ehrQL for health research, the language contains lots of references to patients. For instance, here’s an ehrQL query that finds the number of medications for each patient:
num_asthma_medications = medications.where(
    medications.snomed_code.is_in(asthma_codelist)
).count_for_patient()In the version of ehrQL that we’re using to query the TED data, we’ve decided not to replace _for_patient with _for_student, and instead we’ve completely dropped _for_patient.
So a corresponding query to count the number of English assessments for a student:
num_english_assessments = results.where(
    results.subject == "English"
).count()A successful outcome
With the modifications we’ve made to ehrQL and with the simplified schema, we were able to reproduce the analysis from the original TIDE research paper, demonstrating that ehrQL can be adapted to both the education domain generally and the TED data specifically.