Automatic sleep stage classification with deep residual networks in a mixed-cohort setting

Publikation: Bidrag til tidsskriftTidsskriftartikelForskningfagfællebedømt

Standard

Automatic sleep stage classification with deep residual networks in a mixed-cohort setting. / Olesen, Alexander Neergaard; Jørgen Jennum, Poul; Mignot, Emmanuel; Sorensen, Helge Bjarup Dissing.

I: Sleep, Bind 44, Nr. 1, zsaa161, 01.01.2021.

Publikation: Bidrag til tidsskriftTidsskriftartikelForskningfagfællebedømt

Harvard

Olesen, AN, Jørgen Jennum, P, Mignot, E & Sorensen, HBD 2021, 'Automatic sleep stage classification with deep residual networks in a mixed-cohort setting', Sleep, bind 44, nr. 1, zsaa161. https://doi.org/10.1093/sleep/zsaa161

APA

Olesen, A. N., Jørgen Jennum, P., Mignot, E., & Sorensen, H. B. D. (2021). Automatic sleep stage classification with deep residual networks in a mixed-cohort setting. Sleep, 44(1), [zsaa161]. https://doi.org/10.1093/sleep/zsaa161

Vancouver

Olesen AN, Jørgen Jennum P, Mignot E, Sorensen HBD. Automatic sleep stage classification with deep residual networks in a mixed-cohort setting. Sleep. 2021 jan. 1;44(1). zsaa161. https://doi.org/10.1093/sleep/zsaa161

Author

Olesen, Alexander Neergaard ; Jørgen Jennum, Poul ; Mignot, Emmanuel ; Sorensen, Helge Bjarup Dissing. / Automatic sleep stage classification with deep residual networks in a mixed-cohort setting. I: Sleep. 2021 ; Bind 44, Nr. 1.

Bibtex

@article{1cee1addf17348d794a3d7f2683f432d,
title = "Automatic sleep stage classification with deep residual networks in a mixed-cohort setting",
abstract = "Study Objectives: Sleep stage scoring is performed manually by sleep experts and is prone to subjective interpretation of scoring rules with low intra- and interscorer reliability. Many automatic systems rely on few small-scale databases for developing models, and generalizability to new datasets is thus unknown. We investigated a novel deep neural network to assess the generalizability of several large-scale cohorts. Methods: A deep neural network model was developed using 15,684 polysomnography studies from five different cohorts. We applied four different scenarios: (1) impact of varying timescales in the model; (2) performance of a single cohort on other cohorts of smaller, greater, or equal size relative to the performance of other cohorts on a single cohort; (3) varying the fraction of mixed-cohort training data compared with using single-origin data; and (4) comparing models trained on combinations of data from 2, 3, and 4 cohorts. Results: Overall classification accuracy improved with increasing fractions of training data (0.25%: 0.782 ± 0.097, 95% CI [0.777-0.787]; 100%: 0.869 ± 0.064, 95% CI [0.864-0.872]), and with increasing number of data sources (2: 0.788 ± 0.102, 95% CI [0.787-0.790]; 3: 0.808 ± 0.092, 95% CI [0.807-0.810]; 4: 0.821 ± 0.085, 95% CI [0.819-0.823]). Different cohorts show varying levels of generalization to other cohorts. Conclusions: Automatic sleep stage scoring systems based on deep learning algorithms should consider as much data as possible from as many sources available to ensure proper generalization. Public datasets for benchmarking should be made available for future research. ",
keywords = "automatic sleep stage classification, computational sleep science, deep learning, machine learning",
author = "Olesen, {Alexander Neergaard} and {J{\o}rgen Jennum}, Poul and Emmanuel Mignot and Sorensen, {Helge Bjarup Dissing}",
note = "Publisher Copyright: {\textcopyright} 2020 Sleep Research Society 2020. Published by Oxford University Press on behalf of the Sleep Research Society. All rights reserved. For permissions, please e-mail journals.permissions@oup.com.",
year = "2021",
month = jan,
day = "1",
doi = "10.1093/sleep/zsaa161",
language = "English",
volume = "44",
journal = "Sleep (Online)",
issn = "0161-8105",
publisher = "Oxford University Press",
number = "1",

}

RIS

TY - JOUR

T1 - Automatic sleep stage classification with deep residual networks in a mixed-cohort setting

AU - Olesen, Alexander Neergaard

AU - Jørgen Jennum, Poul

AU - Mignot, Emmanuel

AU - Sorensen, Helge Bjarup Dissing

N1 - Publisher Copyright: © 2020 Sleep Research Society 2020. Published by Oxford University Press on behalf of the Sleep Research Society. All rights reserved. For permissions, please e-mail journals.permissions@oup.com.

PY - 2021/1/1

Y1 - 2021/1/1

N2 - Study Objectives: Sleep stage scoring is performed manually by sleep experts and is prone to subjective interpretation of scoring rules with low intra- and interscorer reliability. Many automatic systems rely on few small-scale databases for developing models, and generalizability to new datasets is thus unknown. We investigated a novel deep neural network to assess the generalizability of several large-scale cohorts. Methods: A deep neural network model was developed using 15,684 polysomnography studies from five different cohorts. We applied four different scenarios: (1) impact of varying timescales in the model; (2) performance of a single cohort on other cohorts of smaller, greater, or equal size relative to the performance of other cohorts on a single cohort; (3) varying the fraction of mixed-cohort training data compared with using single-origin data; and (4) comparing models trained on combinations of data from 2, 3, and 4 cohorts. Results: Overall classification accuracy improved with increasing fractions of training data (0.25%: 0.782 ± 0.097, 95% CI [0.777-0.787]; 100%: 0.869 ± 0.064, 95% CI [0.864-0.872]), and with increasing number of data sources (2: 0.788 ± 0.102, 95% CI [0.787-0.790]; 3: 0.808 ± 0.092, 95% CI [0.807-0.810]; 4: 0.821 ± 0.085, 95% CI [0.819-0.823]). Different cohorts show varying levels of generalization to other cohorts. Conclusions: Automatic sleep stage scoring systems based on deep learning algorithms should consider as much data as possible from as many sources available to ensure proper generalization. Public datasets for benchmarking should be made available for future research.

AB - Study Objectives: Sleep stage scoring is performed manually by sleep experts and is prone to subjective interpretation of scoring rules with low intra- and interscorer reliability. Many automatic systems rely on few small-scale databases for developing models, and generalizability to new datasets is thus unknown. We investigated a novel deep neural network to assess the generalizability of several large-scale cohorts. Methods: A deep neural network model was developed using 15,684 polysomnography studies from five different cohorts. We applied four different scenarios: (1) impact of varying timescales in the model; (2) performance of a single cohort on other cohorts of smaller, greater, or equal size relative to the performance of other cohorts on a single cohort; (3) varying the fraction of mixed-cohort training data compared with using single-origin data; and (4) comparing models trained on combinations of data from 2, 3, and 4 cohorts. Results: Overall classification accuracy improved with increasing fractions of training data (0.25%: 0.782 ± 0.097, 95% CI [0.777-0.787]; 100%: 0.869 ± 0.064, 95% CI [0.864-0.872]), and with increasing number of data sources (2: 0.788 ± 0.102, 95% CI [0.787-0.790]; 3: 0.808 ± 0.092, 95% CI [0.807-0.810]; 4: 0.821 ± 0.085, 95% CI [0.819-0.823]). Different cohorts show varying levels of generalization to other cohorts. Conclusions: Automatic sleep stage scoring systems based on deep learning algorithms should consider as much data as possible from as many sources available to ensure proper generalization. Public datasets for benchmarking should be made available for future research.

KW - automatic sleep stage classification

KW - computational sleep science

KW - deep learning

KW - machine learning

U2 - 10.1093/sleep/zsaa161

DO - 10.1093/sleep/zsaa161

M3 - Journal article

C2 - 32844179

AN - SCOPUS:85100280414

VL - 44

JO - Sleep (Online)

JF - Sleep (Online)

SN - 0161-8105

IS - 1

M1 - zsaa161

ER -

ID: 305025311