Data Entry Processes

Abstract

Established procedures for data receipt and entry are necessary for a study to successfully produce a clinical database of sufficient quality to support or refute study hypotheses. This chapter discusses considerations needed to reduce the likelihood of errors occurring during data entry processes and ensure consistency in a clinical database. These considerations cover topics including workflow components, data receipt and tracking, data entry, data review, data cleaning, and change control for case report forms, databases, and processes.

Introduction

The purpose of data entry processes is to ensure data are reliable, complete, accurate, of high quality, and suitable for statistical analyses. Data entry processes encompass the efficient receipt, tracking, entering, cleaning, coding, reconciling and transferring of data. A number of factors should be considered when choosing a data entry process, such as the skill level and training of personnel, and the amount of time allocated for data entry. Clinical studies vary in study designs and operational plans, therefore the specific design and plan should address the unique requirements for a given study. Throughout a study, an effective plan will ensure each component or step of data entry processes provides an appropriate level of data quality. The International Conference on Harmonisation’s Guidance for Industry: E6 Good Clinical Practice states, “Quality control should be applied to each stage of data handling to ensure that all data are reliable and have been processed correctly.”1

With electronic data capture (EDC) systems, traditional data management roles may change. In most cases, site personnel conduct data entry and may have the capability to run edit checks and make data updates to resolve discrepancies. When data managers are not able to make data edits, they may need to remotely guide site personnel through data cleaning processes. These processes may take the form of automated checks built into the computer system, or may operate through queries entered into the clinical data management system (CDMS). Whether a study is EDC- or paper-based, the functionality of the tools, the design of the study and the skill sets of staff should be carefully considered.

Scope

This chapter focuses on data management functions of data entry processes, including data receipt, data tracking, data entry, change control, data review, data cleaning, and discrepancy identification, resolution, and reconciliation. The chapter is not intended to discuss audit or inspection processes in detail.

Although some of the specific topics addressed by this chapter may not be the direct responsibility of data management personnel, data managers must have an ongoing awareness of requirements and ensure these tasks have been completed in accordance with the principles and standards of their organization, regulatory bodies, and good clinical practice.

Minimum Standards

Utilize written procedures describing data flow, data entry, data processing, and required quality level. Ensure enough specificity to reproduce the analysis database from source documentation.
Ensure employees are appropriately trained (including ICH-specified documentation of having been trained) on systems, procedures, guidelines, working practices, and appropriate references (e.g., materials such as medical dictionaries, medical abbreviations, etc.) and that these documents are current and available to employees throughout the course of the study.2
Ensure all personnel involved with data entry or data management have the proper levels of access, grants and privileges.
Maintain a list of individuals who are authorized to make data changes.3
Apply quality control to each stage of data entry processes to ensure data are reliable and processed correctly.

Best Practices

Address the purpose, characteristics and complexity of each study in data entry training sessions, including, but not limited to a brief review of the protocol, scope of work, and identification of critical variables (usually privacy controlled subject identifiers, primary and secondary efficacy variables, and safety information).
Verify in a test environment (before the data entry system is placed into active use) that entry fields function as planned (e.g., date fields only accept dates, drop-down lists contain appropriate values, skip patterns function properly). In some organizations, true test data pages may be entered for an entire case report form (CRF) packet, while other organizations may perform more focused testing. This is not to be considered a substitute for software validation or edit check testing.
Provide comprehensive user training on CRF completion guidelines and data entry instructions.
Provide sites, sponsors, vendors and study team members with timeline expectations for data receipt, data tracking, data entry, and turnaround times for data queries, file transfers and database deliverables.
Establish thorough tracking mechanisms for the receipt of CRFs and other forms containing data to be entered. Tracking ensures control of the received records, identifies missing records and facilitates the archival of records at the end of the study.
Establish database quality criteria, including a quality control plan that appropriately addresses primary efficacy and safety data.
Monitor data entry functions while in active use to identify trends and ensure stable and desirable quality levels are consistent with study needs.
Create and maintain comprehensive processes for change control.

Workflow

Although specific processes and steps may vary between studies and organizations, the flow of data should follow a logically prescribed path.

When data are received, it should first be tracked or logged, then entered, cleaned, and subjected to rigorous audit/inspection or quality control.

The general workflow of data entry processes for studies using paper CRFs is presented in Figure 1, as well as the choices available at each step. To determine which choices are made at each stage in the data workflow, every organization should have standard operating procedures (SOPs) and data processing conventions.

Figure 1. Paper CRF Data Processing Workflow

Workflow processes for EDC studies may vary according to the CDMS software used. For general principles of EDC workflow processes, see the GCDMP chapters entitled “Electronic Data Capture—Concepts and Study Start-up,” “Electronic Data Capture—Study Conduct,” and “Electronic Data Capture - Study Closeout.”

Data Receipt

Data receipt processes vary across the clinical research industry. Data may be received through fax transmissions, regular mail, express delivery companies with tracking ability, private couriers, hand delivery by monitors, Web entry, or transferred through other electronic means. Regardless of the data acquisition mechanism, the processes by which data are received, confirmed as received, and made available for data entry should be documented in the data management plan (DMP) in sufficient detail to ensure the origin of data is clear.

Standard operating procedures should be in place to ensure blinding of subject identifying information (e.g., name, address, or subject initials) submitted to the data center, unless collection of these data is authorized in the informed consent, protocol, and local regulations. Ensure a process is in place to quickly identify and report incidences of violations of data privacy conventions and laws. Missing CRF reports should be prepared for both paper-based and EDC studies to facilitate identifying forms that have not been received.

Electronic data tracking—Computer-aided page checking can have higher integrity and efficiency than manual processes. Regardless of how data are received, procedures should facilitate timely, high-quality data processing. Expected visit date reports can be programmed into most reporting and tracking systems to follow a subject’s progression through a study and predict the last subject’s final visit dates.
Paper CRF tracking—Tracking may occur on an individual CRF basis or per module. Ideally, all CRFs should be tracked, including mandatory, optional, and in some cases ancillary data. Data recorded on paper forms are recorded in one of the two following fashions, although details may vary between organizations. Some organizations may use a combination of independent or dependent logging with CRF imaging and indexing.

- Independent logging—This approach involves personnel manually registering that study data (not limited to CRFs) have been received. Data receipt may be recorded in the CDMS, although other tracking systems may be used as well.
- Dependent logging—This approach automatically records that a CRF has been received when data from the CRF are entered. This approach can eliminate an extensive and expensive manual process, replacing it with an electronic process in which tracking is a cost-free result of data entry. The trade-off is that any steps between receipt and entry may result in receipt dates that are not accurate. For reliable receipt dates, data should be entered when received, with little or no backlog of data to be entered.

Tracking third-party data—Third-party data, such as laboratory data, may be received electronically or on paper forms. Documented procedures should be in place to track data from each external data provider within a study. For more information about processing third-party data, see the GCDMP chapter entitled “External Data Transfers.”
Imaging and Indexing CRFs—To provide added security and flexibility for paper-based studies, CRFs may be imaged and stored electronically in addition to storing the paper forms. CRFs should be scanned using well- established formats, such as PDF (portable document format). The electronic files must be secured so they are only accessible to authorized and trained personnel. File-naming conventions should be strictly followed, and the repository of CRF image files should be indexed to allow specific files to be located quickly and accurately.

Data Entry

Data entry processes should address data quality needs of the study. The following are some commonly used data entry strategies for studies using paper CRFs.

Methodologies

Double data entry (third-person adjudication)—Two people independently enter the same data and a third person independently resolves any discrepancies between first and second entry.
Double data entry (blind verification)—Two people independently enter the same data, but remain unaware of what values the other entered. If the second entry operator enters a value that differs from the first value entered, the operator is warned that there is a discrepancy. After this warning, the second entry operator (who is responsible for verification) must carefully examine the form and determine the appropriate entry before saving. With this data entry strategy, the second entry will overwrite the prior value if it differs.
Double data entry (interactive verification)—Two people independently enter the same data and the second entry operator resolves discrepancies between first and second entry while being aware of the values entered by the first entry operator.
Single data entry with a review—One person enters the data and a second person reviews the data entered against the source data.
Single data entry with no review—Although not recommended, situations may occur where one person enters data and the data are not subsequently reviewed.
Optical character recognition (OCR)—Software packages are used to recognize characters from paper forms or faxed images and these data are placed directly into the database. Data obtained through OCR should always be reviewed for accuracy.

General Considerations

Although specific data entry processes are not mandated by regulatory bodies or suggested by FDA and ICH guidance documents, a data handling document would most likely be a desired document in an audit or inspection. Having a set of standard data entry conventions for entry is encouraged to ensure consistency in the entry of data throughout the study. Data entry processes should be adapted according to the needed quality level for each data field.

Double data entry is typically used when frequent random keystroke errors may occur or if random errors would be likely to significantly impact analyses. However, a single-entry process with good manual review may be optimal in some circumstances, such as with free text fields.

Sites should have clear guidelines regarding timing expectations between a subject’s visit and data being entered into an EDC system or recorded onto a paper CRF and forwarded to data management. The data management team is often responsible for producing reports that monitor compliance with established data entry timelines.

Although some clinical data management systems are capable of storing automatic default values, which are those written to the database with no action required by the entry operator (most frequently, but not limited to, subject identifiers, site numbers, and visit identifiers), this type of functionality should be used sparingly to reduce the likelihood of unexpected values being overlooked by data entry personnel.4 In contrast, values that are derived, converted, calculated, or hard-coded based on the value of an entered field do not constitute automatic default values and are acceptable processes. Some organizations may perform these calculations outside the database, typically by those performing statistical analyses.

When applicable, system parameters should be set to allow an entry operator to exit the entry screen without saving the data that has been entered, as opposed to the system automatically saving entered data upon exiting. In this type of system, there should always be a prompt reminding the operator that data has not been saved. This approach enables data entry personnel to correct, upon discovery, situations where data may have been erroneously entered. Requiring a conscious decision to save data can also contribute to a higher level of data integrity. If the system does not allow for this data correction technique, a documented method to correct erroneously keyed information should exist.

Entry screens should be designed to minimize data entry errors. For paper studies, data entry screens should follow the pages of the CRFs, and may even be designed to appear identical to the paper CRFs. Some strategies for minimizing entry errors include displaying coded values and providing entry conventions (on entry screens or as a separate paper document), labeling entry fields clearly, and ensuring entry screens provide sufficient space to enter and view expected data.

Considerations for EDC

For studies using EDC, sites should be contacted if they are falling behind in data entry. Although sites are typically entering and cleaning data, data management actions are still needed to help ensure data are entered and processed properly. These data management actions can include training site personnel on EDC system use, measuring site progress on data entry and cleaning, working through forms and data discrepancies with sites, data review, assessing aggregate data to identify subjects with outlying data, identifying data trends, verifying any and all coding, conducting data transfers and performing reconciliation.

Regardless of where data are entered, data entry personnel should be trained on the specific EDC system utilized in a study, as well as being taught the protocol and key data issues they might encounter. After data are entered, monitors verify data using source documents. In some systems, check boxes or particular fields on the entry screen are used by monitors to indicate which fields and visits were verified. In other systems, electronic forms may “graduate” through stages of, for example, data entry, monitored (or source document verified), and locked. In many systems, source document verification is negated if data are changed on the page. In such a case, source document verification must be repeated.

EDC systems may include user interface elements such as radio buttons and pick lists, and may allow fields to only accept specific variable types, such as only allowing numeric variables where appropriate. These systems may also be designed to allow numeric values to be checked against predetermined ranges upon entry. EDC systems can be designed to have dependencies for fields that should only have data when other criteria are met. An example of this design would be asking if a subject is of childbearing potential only if female gender had been selected.

The growing use of EDC systems has also had an impact on the training and desired skills for data entry personnel. In a traditional data entry method such as double data entry of paper CRFs, the skill emphasis is on the number of keystrokes made and the training emphasis is on the specific data entry system utilized. With EDC systems utilizing single entry, an overall understanding of the study becomes much more important in avoiding data entry errors. While performing data entry in an EDC system, site personnel may need to check for online queries and recognize discrepancies as they enter data.

Data Entry Guidelines

Whether using paper CRFs or an EDC system, detailed data entry guidelines should be provided to all data entry personnel. All data entry personnel should also provide written documentation that they have received and understood these guidelines. Data entry guidelines may be part of a broader user manual, particularly for studies using EDC systems. Both data entry guidelines and user manuals may take the form of paper documents or an online manual.

The following topics should be considered for inclusion in data entry guidelines or user manuals.

Contact information of individuals available to troubleshoot computer problems and the hours such help is available
Instructions or conventions describing how to enter data, delete data, and respond to queries
Instructions or conventions describing how to enter data for single and multiple record panels if there is a difference in the system
Reminders to users that a date/time stamp and a user name are recorded as part of the audit trail for every record. The audit trail may or may not be visible, depending on the computer system. Even if it is not visible during data entry, the audit trail must be readable by inspectors and auditors.
Information on computer system security
Instructions for proper computer shutdown procedures to prevent loss of data
Instructions for data entry personnel explaining appropriate actions when edit checks trigger or reconciliation windows for double data entry systems appear

Data Review

Data Cleaning

Data cleaning refers to a collection of activities used to assure the completeness, validity and accuracy of data. Data cleaning activities may include manual reviews of data; computer checks that identify inaccurate or invalid data using ranges, missing data, protocol violations and consistency checks; or aggregate descriptive statistics that reveal unusual patterns in data. Early in a study, data should be reviewed from several subjects at each site to help detect problems with data entry screens not functioning as expected or a site’s lack of compliance or understanding of the protocol.

The following list describes activities that may be included in data cleaning.4

Verify raw data were accurately entered into a computer-readable file.
Confirm code lists contain only valid values.
Confirm numeric values are within predetermined ranges.
Identify and eliminate duplicate data entries.
Determine if there are missing values where complete data are required.
Check the uniqueness of certain values, such as subject identification numbers or codes.
Search for invalid date values and invalid date sequences.
Verify that complex multifile (or cross-panel) rules have been followed. For example, if an adverse event of a particular type occurs, other data might be expected, such as concomitant medications or procedures.
Check for any investigator comments entered on the CRF that could explain data anomalies.
Reconcile all expected CRFs received with those that have been entered.
Confirm inclusion of guidelines detailing reconciliation of adverse events, serious adverse events, lab data, or any additional third-party data.
Check consistency of data across CRFs.
Confirm that data are logical, even when outside expected parameters.

Range checks should be designed to identify statistical outliers, which are values that are physiologically impossible or outside normal variations of the population under study. Consistency checks should be designed to identify potential data errors (e.g., checking the sequential order of dates, corresponding events, and missing data noted to exist elsewhere). Checks designed to identify protocol violations should be closely monitored to allow timely action to be taken. A site should be monitored and investigated when aggregate statistics or other checks indicate substantial differences from other sites. Although manual review for data cleaning and validation is sufficient in some cases, programmatic validation provides high consistency and lower error rates.

Primary and other endpoints, key safety parameters and fields that uniquely identify subject data within the clinical database should be validated sufficiently to assure data are possible, complete, and reasonable. Data cleaning and validation procedures should not suggest bias or lead responses, because leading questions or forced responses can bias study results.

Data Cleaning Considerations for EDC

Many of the data cleaning activities in the preceding list can be automated within a well-designed EDC system and may not require any post-entry effort. In an EDC environment, the site typically performs much of the data cleaning at the point of entry. The site is in control of the data and must either make the data edit or clarify the reason the data are acceptable. For a comparison of data cleaning processes between studies using paper CRFs or EDC systems, see Table 1.

The number of characters an EDC system will allow in a query is important for the data manager to know. Some data managers may be accustomed to paper queries with unlimited space, but most EDC systems require a scroll bar to view lengthy queries. This increases the importance of writing succinct queries to instruct the site to correct data or explain the reason for a discrepant or “abnormal” data value.

Good Clinical Data Management Practices

In an EDC system, built-in checks may initiate either at the time of data entry or when edit checks are run on batches of data. Additional edit checks may also be run and reviewed prior to issuing queries.

Table 1. Data Cleaning Distinctions Between Paper and EDC

Data Cleaning Activity

Paper-based

EDC

Discrepancies, Flags or Notes

After entry and review are complete, flags or notes may be generated outside the database and submitted on individual data clarification forms (DCFs).

Flags or notes may be compiled during entry and review, and subsequently addressed after data entry is completed.

In some instances, items may also be flagged or noted during monitoring.

For data entry systems with no additional functions, flags or notes are identified by monitors and treated similarly to paper.

Some systems show flags or notes on the screen in real time, allowing sites to address flags or notes sooner.

Some systems close flags or notes automatically as values are updated, while others may require manual closing by monitors or data management personnel.

Listings

Cleaning listings differs from cleaning discrepancies, flags and notes in that cleaning may not occur as often due to a higher level of review by monitors, coders, statisticians, lab reconcilers, or safety managers.

Listing reports may be sent to sites periodically to point out missing pages or overdue visits.

Some systems allow cross- page checks, but these may be limited in scope as many programmatic text checks must be manually reviewed.

When posting responses or feedback, some systems may update or populate items right away, but others may be delayed due to system uploads occurring at predetermined intervals.

To ease review and possible correction by site personnel, data managers should understand the EDC system and how data checks are attached to data fields. When checks are not issued against the correct panel, sites may be confused and not take appropriate actions. If a data check is to initiate automatically, it should check each data field only once. To prevent duplication of effort, data management personnel should review previously issued data checks. Because sites must respond to data queries prior to any in- house review, it is critical that checks be properly tested prior to deployment. Deploying inadequately tested checks may result in unnecessary work for the sites and data management team.

Because sites may change data for various reasons, some users of EDC systems may not realize data that is clean today may not be clean tomorrow. These data changes may not be the result of data queries but rather a review of source data. Some systems are capable of locking data once it is clean, however a mechanism should allow the lock to be reversed for data changes if the site finds discrepancies that must be corrected.

Documenting Data Changes

Data may be changed as a result of data cleaning procedures, in which case the site and data center or sponsor must retain a record of all such data changes. Data changes should be recorded and documented by a fax or original site signature acknowledging the new data. This documentation is usually accomplished using a query or data clarification form (DCF). In these cases, the site is expected to keep a record of the change within their study records.

In an EDC environment, site personnel usually make any necessary changes to the data. If nonsite personnel make data changes, a clearly defined SOP should document circumstances in which data can be changed, and a record of any data change should be provided to the site. All documentation of data changes is considered to be essential study documentation and is subject to audit or inspection. For comparison of differences in data-change documentation between paper-based studies and studies using EDC, see Table 2.

Data cleaning conventions may, under some circumstances, specify data that can be modified without a site’s acknowledgement. These are known as self- evident corrections (SEC), and examples include appropriately qualified personnel correcting obvious spelling errors, converting values when units are provided, or providing missing identifiers when the true values are obvious. Because the site must have a record of all data changes, the site should receive and maintain a copy of each version of such data conventions.

Although strongly discouraged, situations do occasionally arise where telephone conversations with the site are utilized to authorize data changes. If this does occur, these changes should be clearly documented both by the site representative authorizing the change and by the data center representative talking with the site. In this way, a record of the conversation and authorization exists at both locations. In any case, any data change authorizations must be documented in writing and included in the study’s documentation for audit or inspection purposes.

Table 2. Data-change Documentation Distinctions Between Paper and EDC

Data Change Type	Paper-based	EDC
Entry changes or errors	System or process changes should be reflected in data entry work instructions. Database changes must be reflected in the audit trail. When authorized changes are submitted by e-mail or phone, a hard copy should be created for patient folders both at the site and with data management.	When changing data, the EDC system should prompt the user to enter a reason for the data change. The reason provided will then be recorded in the database’s electronic audit trail. For non-site personnel, data entry work instructions or conventions will be used for documentation.
Data Clarification Form (DCF) updates	A hard copy of system- generated DCF submittals, once approved by authorized site personnel, should be kept with corresponding CRF pages at site(s) and with data management.	Rather than using paper DCFs, queries are generated and answered through the EDC system, which should include a comprehensive history of all queries recorded in the electronic audit trail.
Self-evident corrections	Self-evident corrections should be documented in study-specific conventions and data entry work instructions.	Self-evident corrections should be noted in the electronic audit trail.
Site-initiated changes	Site-initiated changes should be documented through manual DCF submitted with or without an updated CRF page.	Site-initiated changes should be noted in the electronic audit trail as new information provided by the site.

An audit trail is triggered by the initial data entry, and any changes to the entry are captured and should include the user name, the date and time of the change, the reason for the change, and the previous and current value. Recorded changes must not obscure previously recorded information.5 To obtain consistent, accurate reasons for changes, some EDC systems offer a list of reasons for data changes as well as an option for free text. Since these reasons may vary, there should not be a default entry. Once a change has been committed and recorded in the audit trail, the reason cannot be edited.

A site’s principal investigator should approve and sign off on data collected from that site prior to the data being finalized. This sign-off by the principal investigator must occur in both paper- and EDC-based studies. Any data changes that occur after the investigator signs must be re-signed by the investigator prior to study closeout.

Change Control

Protocol amendments are a fact of life in clinical studies. Changes to the protocol may be made when new information becomes available or is requested by the sponsor or regulatory agencies. While not all protocol amendments require CRF changes, procedures should be in place to handle these situations. IRB approval of protocol amendments must be received prior to deployment of new or changed CRFs. With paper-based studies, CRF changes may take a few weeks to be received by sites, by which time the sites may have received IRB approval for the protocol changes. However, with EDC systems CRF changes can be made remotely and implemented immediately upon IRB approval of the protocol changes.

Process Change Control

All process changes initiated from a protocol amendment must be requested, reviewed, validated, approved and incorporated by following the organization’s SOPs. If a process change involves modification of the clinical database, strict change control processes should be used to ensure preservation of clinical database integrity. Documentation of all process changes should always be stored and available for the entire project team.

At minimum, a process change should include identification and acknowledgement of the change, communication of the change to all stakeholders, and a detailed request outlining any necessary modifications. The process change should be reviewed and approved by key stakeholders prior to implementation of the change.

Any process changes that involve investigative sites should come with clear communication and associated training (if training is deemed necessary). The clinical monitoring team should also be involved in process changes involving investigative sites. Process changes not involving site or monitoring staff should also include proper documentation, communication and training. Process changes should not be implemented until approval is received from all stakeholders and the change is thoroughly tested and validated. In some cases, changes may also require IRB approval.

Database or CRF Change Control

If an approved change is made to an existing CRF or a new CRF is created as a result of a protocol amendment, data management is responsible for checking the consequences on the CRF completion rules and data entry guidelines, and if necessary, is responsible for modifying any existing database tables and creating any new database tables. Documentation of all changes and necessary validation testing is also the responsibility of appropriate data management personnel. As with process change controls, any changes should be communicated to the investigative sites in a timely manner. If CRF completion guidelines or data entry guidelines change as a result, ensure all changes are reflected and disseminated to appropriate personnel.

Change Control for External Data

External data can originate from different sources, and are usually provided by previously selected vendors. External data include any data that are received as an electronic file rather than through paper- or EDC-based data entry. Any changes to external data should be corrected or updated at the source if possible. The vendor and data management should establish specifications and procedures at study start-up to describe how data changes will be communicated throughout the study.

Changes may be communicated between the site and vendor, site and data management, or vendor and data management. Changes communicated to sites are typically managed through DCFs or queries. Changes communicated between the vendor and data management or the site and the vendor are usually communicated through a standardized process outlined in the DMP. Regardless of the methodology employed, any requested data changes must be tracked and documented.

Recommended Standard Operating Procedures

Data Management Plan
Data Validation Design and Testing
Data Receipt
Data Security and Storage
Data Entry
Data Review
External Data Transfers
Discrepancy Management
Quality Control
Database Lock Procedures
CRF Archival