Back to Basics

One of the interesting side-effects of having a done a PhD, aside from some unwanted flashbacks whenever I come within view of a flow cytometer and the uncanny ability to know where the mouse facility is in any research facility based on smell alone, is my iterative approach to pretty much everything. Having spent a good portion of my career in research science, I naturally approach everything as an experiment, gather the data, analyze the data, re-adjust my hypothesis and approach and run another experiment. It’s become so ingrained into my problem-solving methodology that I don’t realize I’m doing it most of the time. How that translates into the managerial work I’m doing now is that it appears that I’m in a constant state of process improvement. This is basically true, I’m constantly running little experiments, trying to solve a bigger issue or achieve a larger goal. I’m sure this drives some people on my team nuts as they are constantly hearing, “can we try this another way”, and I’m also sure that there’s little chance that I will fundamentally change this behavior anytime soon. My task is to make sure that the end goal is always in sight and that the team knows why we are iterating and the results of the iterations, etc. But I think that’s a topic for another post.

My objective in leading off with an explanation of my general thought process is because that is how I came to a new initiative with my team that is very relevant to what I’ve been discussing here. I recently obtained the DAMA Body of Knowledge (DMBOK) and am casually studying it with an eye toward certification. The Data Management Association International (DAMA-I) has been around since the 1980s and is comprised of data management professionals from around the world. They offer certification as well as annual conferences. Anyway, the excessively large book began with a discussion on the definition of basic terms. At first I thought this was a little too basic until I delved deeper into the reading. It wasn’t that the definitions of basic concepts like “data” and “metadata” were so profound that I thought they encompassed all possible scenarios for the use of those words or anything transcendental like that. While I was reading through the rather dry text, it occurred to me that my team hadn’t set these definitions for us. We hadn’t laid the foundation for our data management practice by collectively agreeing on what we meant by “data”, “metadata”, “data management” and “data management principles”.

I decided to start an Intro to Data Management information sharing series with the team. It’s a once a month session in our team meeting that will go through fundamental data management components, starting with the basics. I call it information sharing rather than “training” or “education” because I truly want it to be a discourse on best practices and a re-imaging what these dry, textbook definitions and theories mean for us in our practice of lab data management. This is probably iteration #3 of our team meeting and I know we’re getting closer to a framework that is meeting the team’s needs.

Sidebar: This brings me to another iteration of practice that I introduced recently that is gaining some traction. I came across an article about silent meetings. (See resources here: https://hbr.org/2019/06/the-case-for-more-silence-in-meetings and here: https://www.businessinsider.com/better-idea-silent-meetings-work-2019-1, just to name a few.) The concept struck me at once because of the mix of participation and non-participation that happens in most of the meetings I facilitate. OK, let’s be honest, mostly non-participation. Having come from an organization where participation in meetings was considered an essential skill and the CEO valued “a good dust-up”, this culture of passively listening and head nodding was totally foreign to me. I knew people had good insights and ideas, they just weren’t coming out in the meetings. When I saw an article about silent meetings in a daily email digest I get, it was a definitely a light bulb moment. You can read the links for details but the benefits of this type of meeting are that those who tend to be quiet in meetings can offer their ideas and you avoid the bias of everyone echoing the opinion or thought of the first person who speaks. So far the results have been good in the meetings and the feedback from the team has been overwhelming positive.

Now, after that rather large squirrel, let’s get back to data. So, defining the term data may seem like the most obvious thing in the world and not worth the time it takes to write data. Let me offer this anecdote to illustrate why the time might be worth it. Where I currently work, we get a large number of data requests. We have formal procedures in place now but not too far in the past, email requests would come in from a researcher asking for “the data”. We would email back and say, “OK, here is the raw data you asked for.” Nope, that was not what the requester wanted. Attempt #2: “Here is the processed, standardized data set.” Nope, that wasn’t it either. Attempt #3: “Here is the exact analysis data set used with the metadata attached.” Nope, that wasn’t it either. Attempt #4: “Here are the results (interpreted data)”. That was it!! Finally!! Now of course there are a number of issues with this anecdote including communication, intake processes, triaging of requests, etc. What I would like you to take away from it though is the importance of having a shared vocabulary and how much time it saves in redone work.

Defining terms such as data, metadata, data management, etc is extremely important no matter what kind of work your organization does. Obviously, for a data management and analysis organization, this is absolutely critical. But, I would argue that this is critical for all organizations. If you do not define what “data” is for your organization, all sorts of assumptions can be made about how data moves through the organization, how it is secured, how it is managed, how it is retained. The DMBOK defines data as an asset, “an economic resource that can be owned or controlled and that holds or produces value.” As such, it should be defined and managed. The organization I work for has done an immense amount of work doing just this, defining data, and metadata and data management. It had not trickled down into my team yet. We hadn’t taken those definitions and translated them into ones that were specific to the lab data we manage. Using the silent meeting structure, we defined what data meant to us in the context of the lab data we manage, what metadata was, how data management was defined and what data management principles aligned with how we (and the whole organization) work. I got feedback after the meeting that both the format and content were well-received.

Now is where this post comes all together. Getting back to basics means:

  1. Defining what “data” and metadata” mean to your organization. What do you include in your definition of data and metadata?
    1. Be comprehensive here. Assuming that data has to numerical or electronic can be misleading. There is a lot of “data” that we would have included in the definition not so long ago, names, addresses, the number of times someone visits a website, personal preferences for clothing, etc.
  2. Define what data management means to your organization and decide on your data management principles.
    1. What are the key components of how the organization should manage data?
  3. Outline your data management principles
    1. These should guide the data management practice
    2. Reflect on how your data assets are used or can be used to reach your organization’s goals
    3. You don’t have to do this from scratch. There are lots of good resources out there to use as a starting point.

You can use a number of different types of meetings, asynchronous communication, etc to get at these definitions (hey, why not through a silent meeting in there?). Laying out this foundation for data in your organization and making sure the whole organization (here I mean group, team, whole company, whatever you have influence over) is clear about this foundation, ensures a solid ground on which to build any data project.

So, care to run a little experiment with me? Can you define data for your organization? Do you have clear and consistent data management principles? If not, join me in enacting a bit of change, one little iterative experiment at a time, starting with the basics.

What to do with a vial of blood?

You may have thought from the title of this post that I was going to post some vampire fan fiction. While this wouldn’t be the first time someone thought I was a vampire (that happened years ago collecting blood at night in Haiti for a lymphatic filariasis survey), that’s not really my thing. Last time I talked a bit about the differences between clinical data collected on the Case Report Form (CRF) and non-CRF laboratory data. For today’s post I’m going to walk you through the life-cycle of a specimen and how my team ensures that every specimen possible can be used for testing and subsequent analysis.

The life of a specimen starts at a local lab when the study protocol indicates that a sample is needed for particular testing at that specific visit. The vast majority of this decided ahead of time when the protocol is being finalized. There are specific tests that need to be run at specific time points, either before and/or after treatment or vaccination. For example, at the peak immunogenicity time point post-vaccination, there are specific immunological assays that have to be run to determine if the vaccine has elicited an immune response. For the sake of brevity for this post, I’ll defer discussions on what immunological assays are run for another post. Try not to be too overcome with anticipation.

The tube, or tubes, of blood collected at the clinic are sent along to a local lab to be processed and to have some safety labs run. You’ll remember from a previous post that the type of lab data that I will be opining/educating about is the non-safety lab data for clinical trials. Accompanying the vial(s) of blood is often a written form that includes an inventory of the vials in that shipment and some metadata surrounding the vial, including participant ID, visit number, visit date, specimen type, etc. Now, I want you to pay particular attention to this seemingly minute detail. Because now we have metadata for that specimen entered in the CRF (the lab tech had to check off in the CRF that the specimen was collected and that check produces metadata around the participant ID, specimen type, visit number, data and time collected for the specimen, and all of that is recorded and retained in the clinical database). We also have that metadata on the physical sheet that goes along with the specimen to the processing/local lab. One of the tenets of data management is that if the same information is entered in multiple places, there will likely be errors.

Right now our specimen (i.e. vial of blood) is at the local lab or processing lab to be processed into plasma or serum or cell pellets. Those blood products are aliquoted out and stored either at the local lab or often, at a repository. Now don’t think that all those little tubes are sitting in freezer boxes all nameless. All that metadata that was entered into the CRF and transferred to the lab form is now entered into a Laboratory Information Management System (LIMS). LIMS systems are used to manage all the information around specimens and assay results. If you’re keeping track of our specimen metadata, we now have metadata for the specimens in the CRF, on a physical form and in the LIMS. And every little aliquot (tube) that was derived from the single specimen has that same metadata associated with it.

Now a testing lab is ready to perform testing on a designated aliquot, as outlined in the protocol. The specimens are shipped to the lab with a shipping manifest that contains an inventory of the specimens in the shipment. The specimens’ bar codes are scanned into the receiving lab’s LIMS system and now the fun can begin. For those of you keeping score, the metadata around the specimen now resides in: 1) the CRF, 2) the lab form, 3) the LIMS installation at the processing lab, (4 the LIMS installation at the repository (if one is being used), 5) the LIMS installation at the central or endpoint lab…and a partridge in a pear tree. As you can imagine, having the specimen metadata replicated in all these different places can lead to errors occuring as a consequence of data transfers and being perpetuated through all the downstream locations. This is where my team comes in. We programmatically compare the specimen metadata in the CRF to the metadata in LIMS. The goal is to identify and correct all errors before the specimens are shipped out to the labs peforming the testing. In order to accomplish the daring feat of data management, we have a crack team of programmers supporting us and creating and maintaining the code that does the comparison and spits out reports with errors on it.

Of course, nothing is ever as simple as “generate a report and be done”. The lab data managers on my team work very closely with clinical sites and labs to determine the source of the error and what the definitive source of any given metadata is and to ensure that changes are made in all places where the metadata may be incorrect.

So way all this effort to ensure that a visit date for a specimen is correct? Does that really make a difference in the grand scheme of a whole trial? Channeling our inner consultants, let’s unpack that assumption. Due the complexities of participants that are on PrEP or the fact that HIV vaccines illicit anti-HIV antibodies, HIV diagnosis for clinical trials follows a testing algorithm where specific tests are dictated by the results of previous tests (confirmatory testing) or vist type in the study (i.e. before or after vaccination). This is actually done for HIV testing outside of clinical trials as well. There is a required confirmatory test if you test positive by a rapid test, the same way a woman would go to the doctor for a confirmatory pregnancy test. https://www.cdc.gov/hiv/testing/laboratorytests.html But I digress, as I mentioned the HIV diagnostic testing algorithms can differ by visit. If the wrong algorithm is run on a specimen because the visit number was incorrect in the metadata, it could lead to the wrong result for the participant. That’s obviously not something anyone wants to happen.

While that example is on the extreme end of the spectrum of what ifs, metadata errors for other values can lead to the incorrect testing being performed for other tests, which would lead to incorrect data ending up in the dataset for analysis. If the lab data are being used to evaluate study endpoints, the quality of the lab data is paramount. One of the main goals of my group is to make sure that the lab data used for analysis is as clean as possible and that each data point is a valid data point.

From an ethical standpoint, ensuring that each specimen collected from a participant can be used is critical. Clinical trial participants are a special breed of people who are willing to be part of these studies, sometimes not for immediate benefit to themselves but for the advancement of the science toward a cure. The whole study team is dedicated to guaranteeing that a participant’s involvement in a trial isn’t for naught. Our small contribution to that guarantee to try and make sure that any specimen they give as part of the trial is tested and that data used for analysis and that participants aren’t brought back for additional specimens uneccesarily because no one can find their initial specimen.

I hope that I have convinced you that specimen management is a vital part of the clinical trial process. Please add a comment if you have any questions about the process or why we’ve invested so much time and energy into it.

Up next time…I get back to my “how to run a team” posts with an update of a team retreat we just had.

Lab Data: The Special Snowflake of Clinical Data

We briefly discussed clinical trial data in the last post and the methods used to collect, clean and analyze the data, or at least where you can go find that information. Now we finally get to lab data. Lab data may straightforward, you get results from labs, you add it to the other data from the trial and you’re all set. That is not the case, however, for many trials. Lab data has some nuances to it that make it a bit of a special snowflake.

The decision of where lab data ends up as part of the clinical trial data has to do with 1) what type of lab data it is, 2) how the laboratory and testing structure is set up for the trial, and 3) what type of endpoints the lab data will be supporting.

Let’s start with types of lab data (cause, you know, starting with 1 is usually a good idea). In my organization, lab data is divided up between what we call safety lab data and non-safety lab data, fancy, huh? Safety lab data are the result of testing done on samples to “ensure that patients are not experiencing any untoward toxicities”. (Chuang-Stein C, 1998) These tests are usually ones that you would see at a doctor’s office, liver enzymes, white blood cell counts, etc. In the clinical trials that my organization supports, this lab data is entered into the CRF by testing labs connected with each clinical site or group of clinical sites. Entering safety lab data into the CRFs is industry standard as it keeps all the safety data availalble to be examined regularly to ensure the safety of the participants. The workflow for safety lab data is: a sample is collected from a participant at a visit to the clinical site, that sample is processed and sent to a local lab for testing. Results are sent back to the site, which enters them into the corresponding CRF for that participant and visit. The safety lab data is managed by the Clinical Data Managers (CDMs) for a study and quality checks and processing procedures are the same as the other data collected on the CRFs.

Non-safety lab data consists of lab data that is not generated in support of safety considerations. The spans a whole range of data, including immunogenicity data for vaccine trials and pharmacokinetics (PK) for drug trials. The tests for non-safety lab data can be performed at either local labs or central labs but the key is that the results are not sent back to the site to be entered into the CRF. This is because there are usually no reporting requirements for non-safety lab data. (If a participant has a low white blood cell count, for example, the site would be required to counsel them and perhaps refer them for additional testing). Since the non-safety lab data is not reported onto the CRF, it has to be uploaded to the data management center in some way, cleaned and quality checks performed and errors resolved and then the data are merged with the other clinical data for analysis. The distinction between CRF and non-CRF data is a big one. The CRF data is collected and managed in a Software As a System package (in our case Medidata Rave), that allows for creation of the CRFs, data entry, data cleaning and database creation all in a single, validated and maintained system. Data that comes into a data management center outside of the electronic data capture (EDC) or other CRF system does not have this built-in functionality associated with it, nor the infrastructure around it to make data creation, cleaning and storing relatively easy. Lab data is not the only type of non-CRF data and so these issues span other areas such as questionnaires, SMS or text data or participant dairies. Since I have absolutely no expertise in those areas, I’ll stick to the lab data. Developing the systems to import, process, store and distribute non-CRF data is a big undertaking and I will discuss some of the ways we do this in upcoming posts.

Lab data will be used within the context of a clinical trial, so many organizations opt to embed the lab data management within clinical data management. My organization has opted not to do this, though our processes are aligned with the clinical data management team. Part of the reasons why we have split out the lab data management from the clinical data management has to do with the two other features of lab data management that determine where the lab data ends up in the overall data of a clinical trial.

Workflow of the lab samples and testing may not, on the surface, seem like it would influence what happens with the data downstream but can have a big impact. As I mentioned above, there a few different set-ups for laboratory testing that I’ve seen with clinical trials, and probably endless combinations from there. One scenario is to have the samples drawn at the clinic and sent for processing to a local lab. That same local lab would then perform the safety lab tests, diagnostic testing and store additional aliquots of each sample. Those additional aliquots would then be sent to central or speciality labs for more advanced testing (i.e. immunogenicity or PK testing). In this scenario, the diagnostic test results would be send back to the clinic along with the safety lab results and reported on the CRF. In order to ensure quality and consistency across multiple labs, a selection of samples could be sent to a central lab to verify diagnostic status.

In another scenario, the samples are collected at the clinic, then sent to a local lab for processing and safety lab testing and then aliquots sent to a central repository. The aliquots of the samples would then be sent out to central or specialty labs for immunogenicity, PK or other specialized testing. Additionally, diagnostic testing can also be done by central labs as opposed to local labs.

So what are the implications of these different workflows. In the first workflow, all the safety results and diagnostic results would be reported on the CRF. Any specialized testing would have to be reported through another mechanism other than the CRF but done in a way as to make the results data compatible with the other data from the trial. This is where my team comes in. We receive specialized testing data, process it, resolve erros and create datasets for analysis. The same is true for the second workflow, with specialized lab results having to be sent to the data management center via a secure and consistent pipeline apart from the clinical data stream and my team receiving, and processing the data. A centralized diagnostic lab would have to report the data back to the sites to enter into the CRF in order to be able to give those results to participants. However, in the case of the diagnostic data that we handle from a centralized diagnostic lab, that data comes through my team first, where we perform quality checks and ensure that the correct testing has been done on the correct samples. So where the lab data is coming from influences how it becomes part of the overall data for a trial and who handles it along the way.

Up until now, the reasons why lab data can be unique have to do with the type of lab data being processed and the route by which the data came to the data management center. Taking these two characteristics together, you could still make the case that the lab data could all be reported on the CRF and handled by CDMs, which I stated earlier is how many organizations operate. The final consideration in this argument is what analyses is the lab data supporting (i.e. what type of endpoints will use lab data in the analysis). An endpoint for a clinical trial is defined as “a measurement determined by a trial objective that is evaluated in each study subject”. (Self, SG, 2004). Essentially, it’s what you are measuring your intervention against. Most endpoints for clinical trials are safety and efficacy focused and are called “clinical endpoints”, essentially, is the intervention safe and does it work in stopping or preventing disease. The key word there is “disease”. Aside from the safety measures we discussed above, the goal of a clinical trial is to ensure that an intervention works and in the world that I am in, “works” equals prevents HIV. So the endpoint of a clinical trial would be, does this intervention prevent HIV? That is over-simplifying by a rather big extent. There are different phases of clinical trials that have different purposes, the first of which (Phase I) is just to ensure that a product is safe in humans, and if it’s a vaccine that it elicits an immune response. But for now, “does it prevent HIV” is good. From the lab data perspective, traditional clinical endpoints are relatively easy. Safety data and diagnostic data are reported on the CRF so there is little to do that is different from any other data in the trial.

But what do you do if you’re researching a disease like HIV, or cancer, where the clinical endpoint can take some time to appear? Trials are long enough as it is and waiting a longer time until onset of disease can mean more time until a product is available. What if you are trying to improve on an already existing intervention? The existence of an already-licensed vaccine, for example, may mean that the incidence of that disease in the general population has been reduced such that a huge trial would be needed to get enough infected individuals to have a robust statistical analysis. These considerations, and others, have led researchers to adopt what are called “surrogate endpoints”. A surrogate endpoint is a “biomarker that can substitute for a clinically meaningful endpoint for the purpose of comparing specific interventions”. (Self, SG, 2004). In the vaccine field, these can be correlates of protective immunity or “biomarkers associated with the level of protection from infection or diseaes due to vaccination.” (Self, SG, 2004). The laboratory data that would support a surrogate endpoint or correlate of protective immunity would be the immunogenicity data that I refered to above, which potentially is not part of the CRF. Why does this matter? Data used to support primary or secondary endpoints in clinical trials is the data that is under the most scrutiny from a regulatory perspective. The prime objectives of the study are the ones that regulators are interested in and then there are always additional analyses done by researchers for more scientific reasons.

Ideally, you would want all the laboratories involved in a clinical trial to report results in such a way that the data is entered into the CRF. However, the logistics of this can be challenging, especially when surrogate endpoints are not already defind and there is a large amount of research going into new methodologies and laboratory tests to define those endpoints, which means lots of labs reporting data. This is where I would argue that splitting out the lab data management into its own team is important. While it would seem that having the CDMs handle all the lab data would be advantageous since they are familiar with data handling and having one team handle all the data is good from a consistency standpoint, I think there are more advantages with the split team set-up, and not just because I manage such a team. Having lab data separated out into its own team allows the individuals on the team to become highly specialized in handling a type of data that will not have the standardization or harmonization of the clinical data. For clinical data, there is the CDISC system, which provides a framework to harmonize data structures from data collection through dataset creation and into analysis. This same system does not yet exist for specialized laboratory data. There are lab data components within certain portions of the CDISC system, but it lacks the same infrastructure to assure standardization from data collection to analysis. Therefore, lab data arrives to the data management center in every sort of shape and format and we are responsible for putting it into a format that will fit the statisticians’ needs for analysis and fit into the CDISC structure used by the other clinical data. This is not a cookie cutter type of activity and having individuals that are trained on laboratory assays, in addition to data management, provides a more quality output, at least in my opinion. Also, having a team that is trained in the laboratory assays being used means that communication with the laboratories is smoother. My team can speak the same “language” as the labs and can help with data issues since they understand how the data was generated. Data management involves a lot of communication and cooperation to resolve issues with the data and having a specialized team helps. It also allows me to elevate the visibility of lab data within the organization. With surrogate endpoints becoming more and more frequent in the clinical trials arena, having lab data occupy the same strategic importance within an organization is advantageous from an operations and business perspective.

Whether or not lab data management is done by a separate team or the same team as the clinical data management, there are considerations that make lab data a bit of a special snowflake. The lack of one system to manage the data all the way through the trial (at least in some cases), the variability of the data and the lack of standars for non-safety lab data make this a dynamic and challenging field to work in. In upcoming posts, I will go into how my team manages the challenges.

What is Lab Data Management Anyway?

I thought that for this post, I would introduce the new subject on the blog, lab data management. The idea is that in addition to providing witty reflection on how I got to where I am in my career, I would talk a little more about what that career looks like.

Before I can get to my career and what I actually do (still trying to figure that one out), I should provide some background. Lab data management is a subset of clinical data management so I’ll start there. I am going to use the Wikipedia definition since I got rid of my encyclopedia set decades ago. Clinical data management is a set of processes and procedures that “ensure collection, integration and of data at appropriate quality and cost”. The goal of clinical data management is to generate high-quality, reliable and statistically sound data to ensure that conclusions drawn from research are well-supported by the data. So, no pressure…right?

In many clinical trial settings, both in-house and contracted out (CROs), lab data management is conducted by clinical data managers along with the management of all the other clinical data. There are only a few institutions that I’m aware of that separate the laboratory data. I should clarify that when I’m talking about lab data, I’m not talking about the safety labs done to monitor the participants during the course of the trial (white blood cell counts, liver enzyme tests, etc). Those are monitored along with the other clinical data, at least in our organization. Lab data for my team consists of the endpoint data (HIV diagnostic data), pharmacokinetic (PK) data for drug trials and a whole host of immunology assays that are being done to assess the immune response to vaccines.

So what do we do with the lab data?  I’m so glad you asked.  Lab data management for us can be grouped into two broad categories, specimen monitoring/specimen data quality control and assay data processing.  Specimen monitoring and specimen data quality control are essentially the same thing.  For the purposes of this post, I’ll call it specimen monitoring.  In all clinical trials, participants have specimens taken.  It’s usually blood draws but it can also include tissue biopsies, etc.  The metadata around these specimens can end up being entered in two different data streams, the clinical data stream (i.e. Case Report Form filled out when a participant comes in for a visit), and a Lab Information Management System (LIMS), which is filled out when the specimen is processed in the lab.  In order for the specimen to be used for HIV diagnostic testing or immunological testing, the metadata has to match in both places.  Let’s take the example of the HIV diagnostic testing.  There are algorithms for testing in HIV to determine not only if someone is infected, but if it is an acute or chronic infection.  HIV testing algorithms are not the same for every study.  If you are performing a HIV vaccine trial where the whole point is to elicit antibodies against HIV, you will have to have a series of tests to determine if the antibody responses that show up positive on a diagnostic test are vaccine-elicited or from actual HIV infection.  If you are testing a HIV prevention intervention, the testing algorithm will be different.  So if the metadata for a specimen at the time of draw says that this blood tube is from visit 4 from protocol 001, then the diagnostic lab knows what testing algorithm to run.  If, somewhere in the process of sending the tube to the lab and the transfer of information from the clinical database, to a specimen label or lab requisition form, to the LIMS, the metadata got changed to visit 4 from protocol 002, then the testing algorithm will be different.  This would render any data from that testing invalid. 

One whole scope of work for my team is to ensure that the metadata from a specimen remains correct throughout the course of the study, no matter what data stream that specimen appears in. We accomplish this by programmatically comparing the different data streams each day and issuing QCs when the data doesn’t match.  We then work with the labs and clinics to find the reason for the data discrepancy, the source documentation to determine the real value and to correct the QC.  This ensures that as many specimens as possible can then be used for testing.  Participants trust that when they donate blood or tissue, that it will be put to good use and we help to ensure that it will be.

The second large scope of work for the team is assay processing.  After clinical specimens have been processed and sent to labs for testing, we receive that assay data back into our group.  We again check to make sure the specimen metadata is clean and we also do additional quality checks to evaluate the data for format consistency, logic (if there is supposed to be a numeric value, we check to make sure the values are numeric), and some range checks and other assay specific checks.  This part of our work is important because not only do we want all specimens to be able to be used for testing, we want all the lab testing data to be used in the statistical analyses.  We provide consistently formatted and clean datasets to the statisticians for their analysis. 

In short, lab data management in SCHARP is a group dedicated to preserving high quality laboratory data for analysis in clinical trials by safeguarding the metadata around clinical specimens and providing consistent and clean laboratory datasets for analysis.  If you’re interested, I can go into more detail about how we do this in subsequent posts.I will definitely be doing more posts about why it’s important to think about data management, even in a research setting and discussing some methods and best practices for how to start implementing lab data management, regardless of the setting.