deep learning - eXclusive ORange

The best way to predict a future is to look for it. We live in the moment where the world is a vast machine for predicting the future.

This summer, I spent a significant amount of time contemplating large language models and delving deeper into their research. My first encounter with GPT-2 was back in 2019, where I explored its code and experimented with it. During this period, I became curious about transfer learning and its applications. Additionally, I had some prior knowledge about transformers, but it wasn’t as comprehensive as my understanding of LSTMs and RNNs. I couldn’t confidently explain what they did, for example.

While researching transfer learning with smaller models like GPT-2, I stumbled upon Gwern Branwen’s website (https://gwern.net/) and, in particular, his TWDNE Project (https://gwern.net/twdne). I found it clever because it combined a generative model for both images and text. I decided to focus on the text side of the project, as the image aspect was already well-addressed by applications like Stable Diffusion….

I might revisit the image style transfer aspect in the future, as I had previously explored it to some extent. You can find more about this in my “How to Generate Art Demo Followup.”

Before this, I had predominantly explored machine learning with code from the ground up using Python (PMLC). I have used ML practically in the form of genetic algorithms for tuning parameters on investing models for years, non-differentiable, so no chain rule! An offshoot was a project called gen-gen-algo, a generic genetic algorithm. Now, finally after all these side quests, I was ready to tackle something more complex and cutting-edge using GPT.

I found excellent resources on GitHub and in video format from Andrej Karpathy (https://github.com/karpathy). The following repositories were particularly helpful in my learning journey. The first one, “nn-zero-to-hero,” features a series of videos that provided a solid foundation in understanding transformers.

The second repository, “makemore,” served as my warm-up exercise to get back into working with transformers and Large Language Models (LLMs) after a period of dormancy in the field. You can access these repositories here:

1. “nn-zero-to-hero”: https://github.com/karpathy/nn-zero-to-hero
2. “makemore”: https://github.com/karpathy/makemore

Fork of makemore

My experience with “makemore” went beyond the basic examples provided in the original repository, which generated new names based on a dataset of names. Initially, my goal was to apply “makemore” to various datasets other than “names.txt.” I experimented with larger and smaller datasets, including those with extensive collections of English words, numbers for addition, square roots, and a substantial dataset of quotes containing nearly 10 million entries, some of which had lines as long as 505 characters. By using scripts and modifications to “makemore.py,” I conducted a grid search to optimize hyperparameters, including constraints on model size. Output from “makemore.py” was saved to a CSV file, along with hexadecimal hash values for easy tracking and analysis during the tuning process.

To further enhance the code, I introduced a grid search optimization method using a Bash script. This allowed for exploring the hyperparameter space while maintaining a ceiling on the model size. Without such constraints, optimization typically led to increasingly larger models that resulted in the lowest loss.

I also introduced the concept of assigning a random hexadecimal tag to the output of “makemore.py.” This tagging system facilitated the easy identification of the best loss and the associated set of hyperparameters that produced it. Additionally, I incorporated an early stopping mechanism into the “makemore.py” code.

If you’re interested in exploring my fork of Andrej Karpathy’s “makemore” code, you can find it here:

https://github.com/erickclasen/makemore

For a more detailed understanding, I’ve created a comprehensive “verbose-readme.pdf” that provides additional information:

Version on this site, opens in browser:

verbose-readme

GitHub Version requires downloading:

https://github.com/erickclasen/makemore/blob/master/verbose-readme.pdf

Fun With Machine Learning

I have been sharing the following images around. They were created with machine learning code that I came across and modified a bit while examining how it operates.

The code basically takes an image and imposes a style from another image upon it. It is rather computational expensive as it takes around 12 Gigabytes of RAM to “work” on a 1024×1024 pixel image and about 2 hours of compute time to run the 20 iterations required to complete an image. Machine learning is a fascinating new frontier in technology that I have been spending some time since the spring of 2018 getting to understand on a deeper level. I’ve seen a lot of technologies come and go but, this field has stunned me, is moving forward very fast and is here to stay.

Prior to 2018 I had some exposure to machine learning in the sense of using adaptive control systems in industry. I also worked on a research project that involved a type of fuzzy logic and cellular automata for a learning system that would be used in a control loop. I also developed code that used a tractrix curve as the main element of a control system. But, that is kind of simple machine learning as compared to what is going on today.

This link is to the original paper on this topic, there are some more images in it as well,

eXclusive ORange

A working title for now

Tag Archives: deep learning

Makemore

Fork of makemore

Modifying a image with a style using machine learning

Fun With Machine Learning