The transformer replaced recurrent networks by processing a whole sequence in parallel and letting each token attend to all others through self-attention. That design scales efficiently on modern hardware and underlies essentially every current large language model.
Open-source implementations of the architecture — and the model weights built on it — let teams study, run, and adapt transformers directly rather than only through a hosted API.