Lab Data: The Special Snowflake of Clinical Data

We briefly discussed clinical trial data in the last post and the methods used to collect, clean and analyze the data, or at least where you can go find that information. Now we finally get to lab data. Lab data may straightforward, you get results from labs, you add it to the other data from the trial and you’re all set. That is not the case, however, for many trials. Lab data has some nuances to it that make it a bit of a special snowflake.

The decision of where lab data ends up as part of the clinical trial data has to do with 1) what type of lab data it is, 2) how the laboratory and testing structure is set up for the trial, and 3) what type of endpoints the lab data will be supporting.

Let’s start with types of lab data (cause, you know, starting with 1 is usually a good idea). In my organization, lab data is divided up between what we call safety lab data and non-safety lab data, fancy, huh? Safety lab data are the result of testing done on samples to “ensure that patients are not experiencing any untoward toxicities”. (Chuang-Stein C, 1998) These tests are usually ones that you would see at a doctor’s office, liver enzymes, white blood cell counts, etc. In the clinical trials that my organization supports, this lab data is entered into the CRF by testing labs connected with each clinical site or group of clinical sites. Entering safety lab data into the CRFs is industry standard as it keeps all the safety data availalble to be examined regularly to ensure the safety of the participants. The workflow for safety lab data is: a sample is collected from a participant at a visit to the clinical site, that sample is processed and sent to a local lab for testing. Results are sent back to the site, which enters them into the corresponding CRF for that participant and visit. The safety lab data is managed by the Clinical Data Managers (CDMs) for a study and quality checks and processing procedures are the same as the other data collected on the CRFs.

Non-safety lab data consists of lab data that is not generated in support of safety considerations. The spans a whole range of data, including immunogenicity data for vaccine trials and pharmacokinetics (PK) for drug trials. The tests for non-safety lab data can be performed at either local labs or central labs but the key is that the results are not sent back to the site to be entered into the CRF. This is because there are usually no reporting requirements for non-safety lab data. (If a participant has a low white blood cell count, for example, the site would be required to counsel them and perhaps refer them for additional testing). Since the non-safety lab data is not reported onto the CRF, it has to be uploaded to the data management center in some way, cleaned and quality checks performed and errors resolved and then the data are merged with the other clinical data for analysis. The distinction between CRF and non-CRF data is a big one. The CRF data is collected and managed in a Software As a System package (in our case Medidata Rave), that allows for creation of the CRFs, data entry, data cleaning and database creation all in a single, validated and maintained system. Data that comes into a data management center outside of the electronic data capture (EDC) or other CRF system does not have this built-in functionality associated with it, nor the infrastructure around it to make data creation, cleaning and storing relatively easy. Lab data is not the only type of non-CRF data and so these issues span other areas such as questionnaires, SMS or text data or participant dairies. Since I have absolutely no expertise in those areas, I’ll stick to the lab data. Developing the systems to import, process, store and distribute non-CRF data is a big undertaking and I will discuss some of the ways we do this in upcoming posts.

Lab data will be used within the context of a clinical trial, so many organizations opt to embed the lab data management within clinical data management. My organization has opted not to do this, though our processes are aligned with the clinical data management team. Part of the reasons why we have split out the lab data management from the clinical data management has to do with the two other features of lab data management that determine where the lab data ends up in the overall data of a clinical trial.

Workflow of the lab samples and testing may not, on the surface, seem like it would influence what happens with the data downstream but can have a big impact. As I mentioned above, there a few different set-ups for laboratory testing that I’ve seen with clinical trials, and probably endless combinations from there. One scenario is to have the samples drawn at the clinic and sent for processing to a local lab. That same local lab would then perform the safety lab tests, diagnostic testing and store additional aliquots of each sample. Those additional aliquots would then be sent to central or speciality labs for more advanced testing (i.e. immunogenicity or PK testing). In this scenario, the diagnostic test results would be send back to the clinic along with the safety lab results and reported on the CRF. In order to ensure quality and consistency across multiple labs, a selection of samples could be sent to a central lab to verify diagnostic status.

In another scenario, the samples are collected at the clinic, then sent to a local lab for processing and safety lab testing and then aliquots sent to a central repository. The aliquots of the samples would then be sent out to central or specialty labs for immunogenicity, PK or other specialized testing. Additionally, diagnostic testing can also be done by central labs as opposed to local labs.

So what are the implications of these different workflows. In the first workflow, all the safety results and diagnostic results would be reported on the CRF. Any specialized testing would have to be reported through another mechanism other than the CRF but done in a way as to make the results data compatible with the other data from the trial. This is where my team comes in. We receive specialized testing data, process it, resolve erros and create datasets for analysis. The same is true for the second workflow, with specialized lab results having to be sent to the data management center via a secure and consistent pipeline apart from the clinical data stream and my team receiving, and processing the data. A centralized diagnostic lab would have to report the data back to the sites to enter into the CRF in order to be able to give those results to participants. However, in the case of the diagnostic data that we handle from a centralized diagnostic lab, that data comes through my team first, where we perform quality checks and ensure that the correct testing has been done on the correct samples. So where the lab data is coming from influences how it becomes part of the overall data for a trial and who handles it along the way.

Up until now, the reasons why lab data can be unique have to do with the type of lab data being processed and the route by which the data came to the data management center. Taking these two characteristics together, you could still make the case that the lab data could all be reported on the CRF and handled by CDMs, which I stated earlier is how many organizations operate. The final consideration in this argument is what analyses is the lab data supporting (i.e. what type of endpoints will use lab data in the analysis). An endpoint for a clinical trial is defined as “a measurement determined by a trial objective that is evaluated in each study subject”. (Self, SG, 2004). Essentially, it’s what you are measuring your intervention against. Most endpoints for clinical trials are safety and efficacy focused and are called “clinical endpoints”, essentially, is the intervention safe and does it work in stopping or preventing disease. The key word there is “disease”. Aside from the safety measures we discussed above, the goal of a clinical trial is to ensure that an intervention works and in the world that I am in, “works” equals prevents HIV. So the endpoint of a clinical trial would be, does this intervention prevent HIV? That is over-simplifying by a rather big extent. There are different phases of clinical trials that have different purposes, the first of which (Phase I) is just to ensure that a product is safe in humans, and if it’s a vaccine that it elicits an immune response. But for now, “does it prevent HIV” is good. From the lab data perspective, traditional clinical endpoints are relatively easy. Safety data and diagnostic data are reported on the CRF so there is little to do that is different from any other data in the trial.

But what do you do if you’re researching a disease like HIV, or cancer, where the clinical endpoint can take some time to appear? Trials are long enough as it is and waiting a longer time until onset of disease can mean more time until a product is available. What if you are trying to improve on an already existing intervention? The existence of an already-licensed vaccine, for example, may mean that the incidence of that disease in the general population has been reduced such that a huge trial would be needed to get enough infected individuals to have a robust statistical analysis. These considerations, and others, have led researchers to adopt what are called “surrogate endpoints”. A surrogate endpoint is a “biomarker that can substitute for a clinically meaningful endpoint for the purpose of comparing specific interventions”. (Self, SG, 2004). In the vaccine field, these can be correlates of protective immunity or “biomarkers associated with the level of protection from infection or diseaes due to vaccination.” (Self, SG, 2004). The laboratory data that would support a surrogate endpoint or correlate of protective immunity would be the immunogenicity data that I refered to above, which potentially is not part of the CRF. Why does this matter? Data used to support primary or secondary endpoints in clinical trials is the data that is under the most scrutiny from a regulatory perspective. The prime objectives of the study are the ones that regulators are interested in and then there are always additional analyses done by researchers for more scientific reasons.

Ideally, you would want all the laboratories involved in a clinical trial to report results in such a way that the data is entered into the CRF. However, the logistics of this can be challenging, especially when surrogate endpoints are not already defind and there is a large amount of research going into new methodologies and laboratory tests to define those endpoints, which means lots of labs reporting data. This is where I would argue that splitting out the lab data management into its own team is important. While it would seem that having the CDMs handle all the lab data would be advantageous since they are familiar with data handling and having one team handle all the data is good from a consistency standpoint, I think there are more advantages with the split team set-up, and not just because I manage such a team. Having lab data separated out into its own team allows the individuals on the team to become highly specialized in handling a type of data that will not have the standardization or harmonization of the clinical data. For clinical data, there is the CDISC system, which provides a framework to harmonize data structures from data collection through dataset creation and into analysis. This same system does not yet exist for specialized laboratory data. There are lab data components within certain portions of the CDISC system, but it lacks the same infrastructure to assure standardization from data collection to analysis. Therefore, lab data arrives to the data management center in every sort of shape and format and we are responsible for putting it into a format that will fit the statisticians’ needs for analysis and fit into the CDISC structure used by the other clinical data. This is not a cookie cutter type of activity and having individuals that are trained on laboratory assays, in addition to data management, provides a more quality output, at least in my opinion. Also, having a team that is trained in the laboratory assays being used means that communication with the laboratories is smoother. My team can speak the same “language” as the labs and can help with data issues since they understand how the data was generated. Data management involves a lot of communication and cooperation to resolve issues with the data and having a specialized team helps. It also allows me to elevate the visibility of lab data within the organization. With surrogate endpoints becoming more and more frequent in the clinical trials arena, having lab data occupy the same strategic importance within an organization is advantageous from an operations and business perspective.

Whether or not lab data management is done by a separate team or the same team as the clinical data management, there are considerations that make lab data a bit of a special snowflake. The lack of one system to manage the data all the way through the trial (at least in some cases), the variability of the data and the lack of standars for non-safety lab data make this a dynamic and challenging field to work in. In upcoming posts, I will go into how my team manages the challenges.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s