The COLO‐COHORT (Colorectal Cancer Cohort) study: Protocol for a multi‐centre, observational research study and development of a consent‐for‐contact research platform

Abstract Aim The COLO‐COHORT study aims to produce a multi‐factorial risk prediction model for colorectal neoplasia that can be used to target colonoscopy to those at greatest risk of colorectal neoplasia, ensuring that people are not investigated unnecessarily and maximizing the use of limited endoscopy resources. The study will also explore the link between neoplasia and the human gut microbiome. Additionally, the study aims to generate a cohort of colonoscopy patients who are ‘research ready’ through the development of a consent‐for‐contact (C4C) platform, to facilitate a range of colorectal cancer prevention studies to be conducted at scale and speed. Methods and analysis This is a multi‐centre observational study involving sites across the UK. Recruitment is over a 6‐year period (2019–2025). Patients recruited to the study are those attending for colonoscopy. Patients are recruited into two groups, namely observational group A (10 000 patients) and C4C group B (10 000 patients), known as COLO‐SPEED (Colorectal Cancer Screening Prevention Endoscopy and Early Diagnosis; https://colospeed.uk). Patients complete a health questionnaire, provide anthropometric measurements and submit biosamples (blood and stool—depending on the part of the study they are recruited into). Patients' colonoscopy and histology findings are also recorded. Models of factors associated with the presence of neoplasia at colonoscopy will be developed using logistic or multinomial regression. For internal validation, model discrimination and calibration will be assessed and bootstrapping and cross‐validation approaches used. To enable long‐term follow‐up for outcomes related to colorectal cancer and polyps, patients are asked to consent to follow‐up through data linkage with national databases. Dissemination In keeping with good research practice, following analysis by the study team the study investigators will make the anonymized dataset available to other researchers. The C4C platform will also be accessible to other researchers. The study findings will be submitted for publication in peer‐reviewed journals and lay summaries will be disseminated to participants and the wider public.


INTRODUC TI ON
Colorectal cancer (CRC) is the second most common cause of cancer death worldwide accounting for 935 000 deaths per year [1].
In the UK around 42 000 people are diagnosed with CRC annually with 16 000 dying from it [2]. The majority of CRCs develop through well-established pathways with pre-cancerous colorectal lesions (colorectal adenomas and serrated polyps) progressing to CRC [3,4].
The process can take 10-15 years and there is therefore a window of opportunity for these lesions to be detected and removed during colonoscopy [5].
Advances in prevention, diagnosis and management have resulted in an improvement in the mortality from CRC over the past few decades. In the UK, the introduction of the Bowel Cancer Screening Programme (BCSP) has led to a stage shift in CRC diagnosis; however, the majority of CRCs in the UK are still diagnosed through symptomatic services rather than screening [6].
In the UK more than 675 000 colonoscopies are performed annually and the demand is rising [7]. Endoscopy services were already struggling to provide capacity to meet this demand and the COVID-19 pandemic has significantly reduced endoscopy capacity [8][9][10][11]. For patients, colonoscopy may provoke anxiety, has some risk associated and requires pre-procedural bowel preparation, which patients find unpleasant [12]. To best utilize this limited resource and ensure that people are not investigated unnecessarily, we need to better identify those at greatest risk of CRC and target colonoscopy at those individuals.
At present, the BCSP relies on one factor, age, as the only criterion for eligibility for screening. In England, Wales and Northern Ireland, individuals between 56 and 74 years are invited to participate in each nation's respective bowel cancer screening programme, with plans to reduce the age threshold to 50 years. In Scotland, 50 years of age is already used as the threshold for invitation to their programme. For symptomatic patients, referral for colonoscopy is largely guided by symptoms or clinical suspicion but symptoms do not correlate well with the presence of colorectal neoplasia [13]. A wide range of risk factors for CRC have been identified including increasing age, male sex, obesity, alcohol intake, smoking, ingestion of red meat, family history and reduced physical activity [14][15][16][17]. Currently, these risk factors are not taken into account when assessing whether or not an individual may require a colonoscopy.
Biomarkers could also be of value for patient stratification for investigation. Non-invasive tests such as the faecal immunochemical test (FIT) have been used successfully in the screening setting and have recently been introduced into the English BCSP, with use in the symptomatic setting evolving [18]. Two large, published UK cohorts report good diagnostic performance of the FIT in low-risk patients in primary care [19,20]. As a consequence of the reduction in availability of lower gastrointestinal services due to the COVID-19 pandemic, in some areas the FIT has been used to prioritize patients for definitive lower gastrointestinal investigation; however, the use of the FIT varies hugely nationally.
Additionally, the relationship between gut microbiota and health and disease has been increasingly studied. There is evidence that certain micro-organisms, in particular the Fusobacterium species, are associated with colorectal neoplasia; however, most existing studies are small and knowledge as to whether the gut microbiota could help identify or alter the natural history of patients who harbour the potential for colorectal neoplasia remains somewhat limited [21][22][23][24].
Risk stratification is a technique for systematically categorizing patients based on their risk of a particular condition. Managing patients based on their risk level may make better use of limited health service resources whilst also benefitting patients by avoiding the need for unnecessary investigations in those at low risk [25]. This approach has been utilized in other areas of healthcare, for example in cardiovascular disease, using the QRISK score to guide the need for therapeutic prevention of vascular events (such as myocardial infarction and stroke) and the FIB-4 score within hepatology to non-invasively stage an individual's risk of fibrosis [26,27].
The potential for using risk prediction models to identify patients with colorectal neoplasia has been increasingly studied. Various risk prediction models have been developed in both the screening and symptomatic settings but model performance varies [25,28,29].
Further work needs to be undertaken to achieve a risk model with sufficiently good performance for prediction of colorectal neoplasia to justify use in clinical practice. We hypothesize that it is possible to develop a risk prediction model using clinical factors and readily available laboratory biomarkers (including the FIT) that will enable us to predict patients at highest risk of colorectal neoplasia. We also hypothesize that the stool microbiome in patients may be helpful in identifying those at greatest neoplasia risk [30].
Most current research strategies are based on answering a single question with the study ending with the recruitment of the final Dissemination: In keeping with good research practice, following analysis by the study team the study investigators will make the anonymized dataset available to other researchers. The C4C platform will also be accessible to other researchers. The study findings will be submitted for publication in peer-reviewed journals and lay summaries will be disseminated to participants and the wider public.

K E Y W O R D S
colorectal cancer, colorectal adenoma, cancer risk, colonoscopy, faecal immunochemical test (FIT) patient; however, current good research practice supports making datasets discoverable [31]. Furthermore, most patients for CRC research are recruited on a study-by-study basis, despite it being advantageous to be able to deliver multiple studies simultaneously.
An alternative approach is to develop a pool of research-ready patients who can be contacted when studies relating to an aspect of screening, prevention and early diagnosis research relevant to them becomes available. This research-ready consent-for-contact (C4C) population would enable a range of CRC-related studies to be conducted at scale and speed and would facilitate rapid engagement with patients and the public in the development and design of research studies.
The objectives of the COLO-COHORT study are as follows: • to develop a multi-factorial risk prediction model for prevalent colorectal neoplasia; • to develop a cohort of patients who will be followed up long term through medical records and national databases for outcomes related to colorectal neoplasia, in order to test the long-term value of the risk prediction model; • to compare the structure and diversity of the faecal microbiome in patients with and without colorectal neoplasia; • to develop a C4C platform of colonoscopy patients who have consented to be contacted for current and future research opportunities; • to build a digital platform to support patient involvement, recruitment and data collection.

Study design
COLO-COHORT is a multi-centre observational study involving sites across the UK. Patients are recruited into two groups, namely group A and C4C group B, also called COLO-SPEED (Colorectal Cancer Screening Prevention Endoscopy and Early Diagnosis; https:// colos peed.uk). Recruitment will take place over a 6-year period (2019-2025).
For group A, 10 000 patients will be recruited into the main observational element of the study. This group is subdivided into groups A1 and A2. Patients in group A1 submit blood and stool samples, whereas these samples are not required for patients in group A2; instead results from previous blood/stool tests are obtained from patient records.
For group B, 10 000 patients will be recruited into the C4C arm of the study (COLO-SPEED). These patients may also be recruited into group A and/or into other endoscopy studies.
The study includes patients attending colonoscopy as part of the English BCSP and those referred through standard National Health Service (NHS) care for indications including, but not limited to, symptoms, family history or as part of surveillance programmes. • Known colonic stricture which would limit complete colonoscopy; • Attending for planned therapeutic procedure other than polypectomy, such as insertion of colonic stent; • Attending for assessment of known inflammatory bowel disease activity or for inflammatory bowel disease surveillance; • Patients currently recruited into an interventional clinical trial of a medicinal product for CRC prevention.

Group B
The COLO-SPEED funding infrastructure is currently only available to sites that are part of the Northern Region Endoscopy Group (NREG.org.uk) and thus only patients from the northeast of England can be recruited into group B.

Inclusion criteria
• Any patient attending for colonoscopy and able to give informed consent; • ≥18 years old; • Patient attending for colonoscopy in a site supported by COLO-SPEED infrastructure.

Exclusion criteria
• Unable to give informed consent.
COLO-SPEED (group B) aims to establish a C4C database; therefore participation in parallel research studies is encouraged and not an exclusion criterion.

Withdrawal criteria
Patients from either group A or group B are withdrawn from the study if they withdraw consent for study participation or withdraw consent to undergo colonoscopy.

Recruitment process
All eligible and consenting patients are recruited following referral for colonoscopy. COLO-COHORT commenced just before the COVID-19 pandemic and, in response to the pandemic, the recruitment process was adapted to minimize the time patients spend in the hospital for research purposes. Patients are contacted via telephone prior to their colonoscopy appointment to assess interest in study participation. If patients are interested in the study, the research team send out a patient information sheet and a FIT (as applicable) and arrange to contact the patient via telephone again at a later date. A pre-bowel preparation FIT sample is required for the study and therefore those patients who are required to submit a new FIT sample need to be contacted in a timeframe that allows Patients undergoing a colonoscopy as part of the BCSP are not sent a FIT as this has already been undertaken as part of the screening programme. The same FIT used within the English BCSP (OC Sensor, Mast Diagnostics) is provided to symptomatic patients for uniformity of quantitative FIT results as FIT varies between different manufacturers [32]. Where patients have already undertaken a FIT as part of a symptomatic pathway, the FIT is repeated in this study.
The FIT results generated within the study are not directly used to inform patient management through the study; however, local principal investigators are informed of abnormal results and have discretion to act upon these where there is clinical concern.
On the second contact, a member of the research team discusses the study with the patient, answers any questions, assesses eligibility, obtains verbal consent (if the patient is eligible and willing to take part) and gathers other information required for the study.
On the day of the colonoscopy appointment, written informed consent is obtained, anthropometric measurements and blood tests (as applicable per study group) are taken, and patients are recruited into the study.
The adaptations to minimize face-to-face contact in response to the COVID-19 pandemic will be reviewed on an ongoing basis as the pandemic changes and face-to-face contact will be reinstated if and where it is considered appropriate.

Data collection
Group A Six thousand participants in group A (group A1) will submit a prebowel preparation FIT sample. This will be a combination of new FIT collections and samples from patients who have already submitted a FIT sample as part of the BCSP ( Figure 1).
As approved by the ethics committee, submission of the FIT sample represents initial consent for the study with further information provided in the patient information sheet and full written consent given on the day of colonoscopy. All FIT samples are returned to the northeast BCSP hub at Queen Elizabeth Hospital, Gateshead, for analysis. For patients recruited who are attending colonoscopy because of a positive FIT taken in the BCSP, the quantitative result of that FIT is transferred to the study database.
In addition, a health questionnaire detailing personal characteristics (level of education, employment status), lifestyle behaviour (smoking status and alcohol intake history), medical and medication history, and family history of CRC is completed, as well as a validated physical activity questionnaire and validated food frequency questionnaire for those undergoing microbiome analysis [33,34]. In the light of the COVID-19 pandemic, questionnaires may be completed remotely from the endoscopy unit. Postage paid envelopes are provided by the research team to facilitate the return of completed questionnaires.
At the colonoscopy appointment, height, weight and waist circumference are measured. In addition, blood samples for full blood count, liver function tests, aspartate aminotransferase, C-reactive protein, lipid profile, fasting glucose, HbA1c, and whole blood for DNA extraction are taken, via a cannula inserted as part of standard care for colonoscopy. Where sampling is not possible from a cannula, a separate venepuncture is undertaken. These tests are not repeated if results from within the last 8 weeks are available. All blood tests other than the blood sample for DNA extraction are labelled and processed in line with local Trust policy in local laboratories.
The sample for DNA is pseudonymized using a unique study ID and transferred to the Central Biobank Facility at Newcastle University.
Samples are stored at −80°C until thawed in preparation for DNA extraction; following extraction genomic DNA is stored at −80°C.
Patients' colonoscopy findings are recorded along with histopathology results from any lesion removed or sampled. Diagnosis of neoplasia is based upon histological findings.
A further 4000 patients will be recruited into group A (group A2). Although the dataset recorded is similar to group A1, patients are not required to collect stool for the FIT nor have blood samples taken. This allows patients who are unable or unwilling to provide biosamples to participate in this part of the study. These patients will provide the research team with a significant amount of useful data to enhance or potentially validate data from the A1 group. Any recent blood tests of interest in the past 8 weeks prior to their colonoscopy (as a minimum, full blood count and liver function tests) as well as FIT results if done as part of routine care are recorded.
All patients from group A are asked if they also consent to the following: • long-term follow-up for future outcome related to colorectal polyps or CRC through linkage to routine national databases such as National Cancer Registration Database, Hospital Episode Statistics data, Office for National Statistics mortality data, National Endoscopy Database and the COloRECTal cancer data Repository (CORECT-R) [35]; • access to previous endoscopy results, histological samples from colonoscopy or other relevant laboratory results; • use of anonymized information or samples collected from this study in future studies; • consent for future contact for collection of additional information related to this study.
Patients may opt into or out of each of these.

F I G U R E 1
Overview of COLO-COHORT. *10 000 patients in COLO-SPEED (group B) will be from the north of England and can include patients from group A. **FFQ, food frequency questionnaire [33]. § Patients will be offered the option of limited or more extensive data collection if only in group B. Stool microbiome analysis is a rapidly evolving field [21,22].
In the first instance, we propose to measure microbiome diversity and individual pathobionts by 16S rRNA sequencing but will remain flexible with our approach to, for example, shotgun metagenomic sequencing based on subsequent developments in the field, DNA yields from FIT stool extractions and our funding envelope [36,37] ( Table 1).

Group B (C4C; COLO-SPEED)
Patients from group A can also be recruited into group B.
Patients are recruited to COLO-SPEED from northeastern recruitment sites, with the longer-term vision of expanding to other sites nationally, subject to securing funding.
All patients consenting to group B consent to being contacted about future research involvement opportunities, including relevant research studies, patient and public involvement (PPI) opportunities and public engagement activities. They will be sent a link to register online for the COLO-SPEED Research Network (CSRN).
In addition to consent for future contact, patients are asked if they wish to consent to the following:

Nested studies
COLO-COHORT will recruit a large population, and appropriate nested studies, for example collecting data on patient experience of colonoscopy or impact of COVID-19 on colonoscopy uptake and experience, may be incorporated into the study (subject to additional ethical approval where required).

Adverse events
This is an observational study and therefore no adverse events resulting from participation are anticipated. Any adverse events related to colonoscopy will be managed and recorded in line with standard care.

Assessment and follow-up
No additional study visits are required as part of the study. However, patients may receive a maximum of two reminders at monthly intervals to complete the patient questionnaires (outlined above), if required.
Patients who consent to long-term follow-up may have their endoscopy records and medical records interrogated for outcomes related to colorectal neoplasia as described above.

Statistical analysis and sample size
Data from patients recruited to group A will be used to develop the risk prediction model. Primary analysis will be based on the outcome of the presence of colorectal neoplasia at the recruitment colonoscopy. This will be defined as the presence of advanced adenoma (AA)/CRC. AA will be defined as an adenoma of at least 10 mm in size or containing high-grade dysplasia [38]. Secondary analyses will focus on other relevant outcomes, including CRC (only), AA (only), any adenoma, any polyp, serrated polyps, numbers of adenomas and number of polyps. The TRIPOD statement will be followed for reporting of the risk prediction model [39].
Initially we will develop models separately for BCSP and other (mainly symptomatic) subjects; further analysis will explore whether an overall model can be created. Firstly (in the non-BCSP subjects), sensitivity, specificity, positive predictive value and negative predictive value of FIT alone for the detection of the presence of AA/CRC will be assessed. Different FIT cut-offs will be explored. Subsequently, the relationship between patient characteristics and lifestyle, phenotypic information, FIT results, blood markers and presence of AA/CRC will be investigated using logistic regression. Backwards elimination of candidate predictors will be used to identify variables which best predict neoplasia [40].
Receiver operating curve plots, area under the curve analyses of sensitivity and specificity, and Harrell's C-index will be used to characterize the discriminative ability of the models [41,42].
Calibration will be assessed using Hosmer and Lemeshow tests, the percentage of people reclassified by different models and the net reclassification index [43]. Internal validation will be undertaken using bootstrapping to quantify the model's potential for overfitting and optimism in estimated model performance [40].
Cross-validation approaches will also be applied, systematically excluding groups of subjects (e.g., by site) in turn [44]. In addition, multinomial, or ordinal, logistic regression will be used to investigate predictors of the secondary outcomes. Multiple adenomas may occur in an individual. It is anticipated that the presence of multiple adenomas will be zero-inflated, with some patients having zero and others many. The extent to which zero-inflated models (e.g., allowing for aggregation) improve the prediction of the presence of adenomas will be assessed.
The microbiome data will be subjected to canonical correspondence analysis to identify (i) major trends in variation across the patient cohort and (ii) putative drivers of the variation in each patient subgroup (defined on the basis of colonoscopy result: CRC, advanced adenomas, non-advanced adenomas, sessile serrated polyps and clear colon). Canonical correspondence analysis seeks to explain the pattern of variation in complex (multivariate) datasets using covariates hypothesized to be of significance in causing variation. We hypothesize that there will be differences in microbiome composition that will be dependent on the adenoma/disease status of patients and that these may act as markers for disease. We will use similar approaches to determine whether dietary data, when combined with the microbiome, reveal further differences between groups. Correspondence analysis will be used to provide summaries of the variation microbiome to allow for investigation of the contribution of microbiota profile to the risk prediction models.
The study is exploratory in nature-particularly in relation to whether microbiome data make an important contribution to risk prediction models. Therefore the sample size calculation must be governed by principles of adequate population size.
The sample size of patients providing biosamples (6000)  Group B sample size is based upon generating a significant but manageable C4C population.
A full statistical analysis plan will be developed prior to data analysis.

Study oversight
To provide robust governance and monitoring, a three-tiered governance structure has been devised. The COLO-COHORT Central

Patient and public involvement (PPI)
Patients and the public are extensively involved in this study, including in the development of procedures and review of patient facing materials. A patient representative attends regular COLO-COHORT research group meetings and is a co-author of this paper. Several workshops and PPI days have been organized to discuss and improve the study.
Sessions will be organized to feed back and discuss study results.

Data storage and management
Data collected are recorded onto REDCap, a secure online database [47,48]. Data are pseudonymized with each patient's unique study ID.
The personal identifiable data required for the C4C patient group (COLO-SPEED, group B) is held in each local recruiting site and subsequently will be sent to the Newcastle University research team, via secure encrypted email, who will send participants the link for registration and profile creation to the CSRN, after which all personal identifiable data will be securely destroyed.
In keeping with good research practice, the study co-investigators will make the anonymized dataset available to other researchers.
How to apply for this will be made available on the study website in due course (https://colos peed.uk). The platform is also available to researchers wishing to utilize this research-ready population to increase recruitment to and engagement with new studies. Access to this will be granted following review by the study co-investigators and study management group.

Dissemination
For academic and clinical dissemination, the results will be submitted for publication in high-impact international peer-reviewed journals and presented to scientific meetings. Additionally, to support and promote PPI, lay summaries will be prepared and posted on the study website. The study team will also work with the PPI representative to identify other routes for lay dissemination.
To maximize clinical impact, the research team will actively seek to work with other datasets/researchers to undertake external validation of the risk prediction model.

DISCUSS ION
Development of risk adapted triage for colonoscopy has been identified as a research priority [49]. COLO-COHORT will produce As far as we are aware, COLO-SPEED will be the first ever C4C research platform for patients being investigated for potential CRC.
This platform will facilitate future research to be conducted at speed and scale. This will allow more rapid delivery of research and thus more rapid translation of results into patient benefit.

E TH I C S S TATEM ENT
The study was reviewed and approved by the West Midlands

ACK N OWLED G EM ENTS
We wish to acknowledge the research and innovation team at South Tyneside and Sunderland NHS Foundation Trust, in helping to deliver the study as the sponsor site. We also wish to acknowledge the assistance and laboratory expertise from the Newcastle Central Biobank Facility.

DATA AVA I L A B I L I T Y S TAT E M E N T
In keeping with good research practice, following analysis by the study team the study investigators will make the anonymised dataset available to other researchers.