Is Excel Holding Back Biopharmaceutical Research?
Contributed Commentary by Jesse Harris, ACD/Labs
January 3, 2022 | Data aggregation and consolidation are essential in biopharmaceutical development. Research teams use a wide range of analytical and sensor data from various instruments, which are often spread across multiple sites. These data streams need a common platform to allow scientists to review the data and make decisions.
Excel is used to perform this consolidation in many companies. This is natural given that Excel is ubiquitous, has been used for many years, and virtually every scientist is familiar with the software. Excel is also versatile and can be configured to address various gaps in data management when the data is numerical.
While access to Excel is convenient, this software is the root cause of many data management problems. Excel has several systematic weaknesses when it is used to aggregate and process research data. These issues lead to lower efficiency, wasted resources, and frustration. It is worth considering whether Excel is an obstacle to innovation.
Excel has not been designed for use in a pharmaceutical or biopharmaceutical context. Excel does not “understand” chemistry or biochemistry. Simple tasks such as representing chemicals as structures or searching based on molecular structure are difficult or impossible. You also cannot create relationships between biomolecules such as “reactant/product.” Enterprising researchers have developed workarounds to overcome Excel’s chemical illiteracy, but these are often unreliable or challenging to use.
Beyond chemical information, Excel has also not been designed to work with live analytical data. “Live” analytical data means data that can be reprocessed, such as a chromatogram or NMR spectrum. “Dead” analytical data cannot be reprocessed, such as a picture of a chromatogram or peak table. Live data allows researchers to reanalyze the data to learn more and ask new questions as research progresses.
For analytical data to be imported into Excel, it must be “flattened” and turned into a dead table of numbers. It is usually impossible to reinterpret these results, meaning you must find the original analytical file, reprocess the data in separate software, re-export the data in an Excel-friendly format, and re-import the data into your spreadsheet. This error-prone, time-intensive, and tedious operation must then be repeated for every relevant data stream whenever a new question arises.
This inability to handle live analytical data also feeds into Excel’s versioning issue. Over the course of a project, multiple versions of a spreadsheet will be created. The number of files grows over time, making it increasingly difficult to tell which is up-to-date. The challenge compounds in collaborative projects, as everyone creates files, which can be mislabeled, confused, or lost. Modern biopharmaceutical research often involves multiple teams spread across several sites, making the versioning problem virtually unmanageable.
To avoid versioning issues, many teams use shared Excel sheets that synchronize over the internet. This only partially solves the problem, as multiple file versions inevitably show up. Project leaders also act as data police, micromanaging permissions and tracking down missing data.
Strict data policing is necessary partly because of Excel’s lack of audit trails. Regulators require a chain-of-custody for data so they can verify the accuracy of results. An ideal system would allow users to instantaneously access analytical files connected to each result. Excel can link to external files, but this is limited and can be disrupted. Without a complete audit trail, experiments may have to be repeated, representing wasted time and resources.
Excel is obviously not the only software for managing data. There is a wide variety of data management tools, including electronic lab notebooks (ELNs), chromatography data systems (CDSs), and laboratory information management systems (LIMSs). These data management systems offer many advantages over Excel, such as handling analytical data or environments built for collaboration.
With all these tools in place, why does Excel remain so common? The problem arises at the gaps within the data environment. Excel files are often used to transfer data from an instrument into a data management tool or between incompatible pieces of software. This reintroduces all the problems with Excel, such as flat analytical data, versioning issues, and broken audit trails.
Researchers have come to accept Excel’s shortcomings because they believe there are no alternatives, but this is not the case. A chemical manufacturing and control (CMC) decision support tool allows users to access all their analytical data in one interface. Scientists can also be resistant to change because it is uncomfortable, and there is no push to force scientists to make change. Over time, companies that implement systems that do not rely on Excel will have improved regulatory compliance, accelerate research, and save money.
Despite all this, Excel is still an impressive application. It is so flexible that it feels like the software can be stretched to do almost anything with enough plugins and integrations. The problem is that we are asking too much from Excel. It is no longer able to keep up with the needs of modern biopharmaceutical research. It is time to take Excel’s limitations seriously and adopt solutions that help rather than hinder innovation.
Jesse Harris is a Marketing Communications Specialist at ACD/Labs. He received an MSc in chemistry and an MASc in in chemical engineering from Queen’s University in Kingston, Ontario. He is passionate about data management in the pharmaceutical industry and scientific communication. Reach him on LinkedIn: https://www.linkedin.com/in/jesse-ji-harris/