Computational sleep scoring from multimodal neurophysiological time-series (polysomnography PSG) has achieved impressive clinical success. Models that use only a single electroencephalographic (EEG) channel from PSG have not yet received the same clinical recognition, since they lack Rapid Eye Movement (REM) scoring quality. The question whether this lack can be remedied at all remains an important one. We conjecture that predominant Long Short-Term Memory (LSTM) models do not adequately represent distant REM EEG segments (termed epochs), since LSTMs compress these to a fixed-size vector from separate past and future sequences. To this end, we introduce the EEG representation model ENGELBERT (electroEncephaloGraphic Epoch Local Bidirectional Encoder Representations from Transformer). It jointly attends to multiple EEG epochs from both past and future. Compared to typical token sequences in language, for which attention models have originally been conceived, overnight EEG sequences easily span more than 1000 30 s epochs. Local attention on overlapping windows reduces the critical quadratic computational complexity to linear, enabling versatile sub-one-hour to all-day scoring. ENGELBERT is at least one order of magnitude smaller than established LSTM models and is easy to train from scratch in a single phase. It surpassed state-of-the-art macro F1-scores in 3 single-EEG sleep scoring experiments. REM F1-scores were pushed to at least 86%. ENGELBERT virtually closed the gap to PSG-based methods from 4-5 percentage points (pp) to less than 1 pp F1-score.