Are Transformers Here to Stay for a While?

A Transformer Theory

Part of my pet theory on transformers is that they might be a thing that will stick around for a while. I know ML changes fast but, I have an example from history that serves to illustrate a design that has staying power. Radio started out with many schemes to detect signals progressing along the following lines, coherers, galena crystals, diodes, tuned radio frequency, regeneration, super regeneration, all forms of direct conversion ( signal gets demodulated right from the incoming frequency) and finally superheterodyne which takes the incoming frequency and mixes it down or up to an intermediate frequency before detecting/de modulating the signal. The first methods each lasted but a few years and all under performed and had flaws. Once the superheterodyne was invested there were just a few different flavors of the same idea, so called single, double and triple conversions, really just more layers to reject out of band signals more efficiently. The superheterodyne, like the transformer has staying power. After 100 years it is the ‘way’ to handle a radio signal. So that tech is in your Wifi, Phone, TV, Stereo, Modem, 2 way radios, GPS and so on and is unlikely to be replaced by anything but a tech that just maps it onto digital tech. So my theory is that the transformer is in the same ballpark, it is the superheterodyne of ML or at least close, a step more or two away, that’s all. The types of radio reception using earlier methods each had their day in the sun, just as RNNs, LSTMs, GRUs were each the cutting edge ML go to architecture for a while.


