Daily Archives: November 2, 2023

makemore

Makemore

The best way to predict a future is to look for it. We live in the moment where the world is a vast machine for predicting the future.

This summer, I spent a significant amount of time contemplating large language models and delving deeper into their research. My first encounter with GPT-2 was back in 2019, where I explored its code and experimented with it. During this period, I became curious about transfer learning and its applications. Additionally, I had some prior knowledge about transformers, but it wasn’t as comprehensive as my understanding of LSTMs and RNNs. I couldn’t confidently explain what they did, for example.

While researching transfer learning with smaller models like GPT-2, I stumbled upon Gwern Branwen’s website (https://gwern.net/) and, in particular, his TWDNE Project (https://gwern.net/twdne). I found it clever because it combined a generative model for both images and text. I decided to focus on the text side of the project, as the image aspect was already well-addressed by applications like Stable Diffusion….

Misato Katsuragi as a Math Teacher
Misato Katsuragi as a Math Teacher

I might revisit the image style transfer aspect in the future, as I had previously explored it to some extent. You can find more about this in my “How to Generate Art Demo Followup.”

Before this, I had predominantly explored machine learning with code from the ground up using Python (PMLC). I have used ML practically in the form of genetic algorithms for tuning parameters on investing models for years, non-differentiable, so no chain rule! An offshoot was a project called gen-gen-algo, a generic genetic algorithm. Now, finally after all these side quests, I was ready to tackle something more complex and cutting-edge using GPT.

I found excellent resources on GitHub and in video format from Andrej Karpathy (https://github.com/karpathy). The following repositories were particularly helpful in my learning journey. The first one, “nn-zero-to-hero,” features a series of videos that provided a solid foundation in understanding transformers.

The second repository, “makemore,” served as my warm-up exercise to get back into working with transformers and Large Language Models (LLMs) after a period of dormancy in the field. You can access these repositories here:

1. “nn-zero-to-hero”: https://github.com/karpathy/nn-zero-to-hero
2. “makemore”: https://github.com/karpathy/makemore

Fork of makemore

My experience with “makemore” went beyond the basic examples provided in the original repository, which generated new names based on a dataset of names. Initially, my goal was to apply “makemore” to various datasets other than “names.txt.” I experimented with larger and smaller datasets, including those with extensive collections of English words, numbers for addition, square roots, and a substantial dataset of quotes containing nearly 10 million entries, some of which had lines as long as 505 characters. By using scripts and modifications to “makemore.py,” I conducted a grid search to optimize hyperparameters, including constraints on model size. Output from “makemore.py” was saved to a CSV file, along with hexadecimal hash values for easy tracking and analysis during the tuning process.

To further enhance the code, I introduced a grid search optimization method using a Bash script. This allowed for exploring the hyperparameter space while maintaining a ceiling on the model size. Without such constraints, optimization typically led to increasingly larger models that resulted in the lowest loss.

I also introduced the concept of assigning a random hexadecimal tag to the output of “makemore.py.” This tagging system facilitated the easy identification of the best loss and the associated set of hyperparameters that produced it. Additionally, I incorporated an early stopping mechanism into the “makemore.py” code.

If you’re interested in exploring my fork of Andrej Karpathy’s “makemore” code, you can find it here:

https://github.com/erickclasen/makemore

For a more detailed understanding, I’ve created a comprehensive “verbose-readme.pdf” that provides additional information:

Version on this site, opens in browser:

verbose-readme

GitHub Version requires downloading:

https://github.com/erickclasen/makemore/blob/master/verbose-readme.pdf

 

 

 

 

red short bob haircut holding up a calendar sai-photographic

Calendar Tricks

Time is like dancing.

There are a few patterns in the modern calendar that can help you figure out days of the week through the year. I just happened to notice this first pattern once when looking at the calendar for the year closely.

2023 Calendar
2023

9 Months of the year “line up” in a pattern with them starting on the same day of the week. This means that there days line up together.

So if you know for example that September 25th is on a Monday, Christmas will be on Monday as well. It is a quick and easy calendar cheat.

In non-leap years:
– January and October start on the same day of the week.
– February, March, and November start on the same day of the week.
– April and July start on the same day of the week.
– September and December start on the same day of the week.

In leap years:
– January, April, and July start on the same day of the week.
– February and August start on the same day of the week.
– March and November start on the same day of the week.
– September and December start on the same day of the week.

This type of pattern can be useful for various purposes, such as planning and organizing events that need to occur on specific days of the week or identifying months with similar day-of-the-week characteristics.

Mentally Calculating the Day of the Week

Also the calendar can have a so called anchor day of the week that appears in a pattern and allows quick mental calculation of a day of the week. This anchor day will ‘move’ through the week from year to year, one day forward for non-leap and two days for leap years. For example the anchor day is Tuesday in 2023, so all the following days of the year will fall on a Tuesday. For 2024, being a leap year the lower dates all fall on a Thursday.

All of the following dates have the same day of the week. From these days that anchor the day of the week it is easy to mentally find the day for any date in the month. It is possible to remember that weeks go back 7 and work to the week and then just add or subtract to get to the date that you want to now the day of week for.

Non Leap 1/3, 2/28, 4/4, 5/9, 6/6, 7/11, 8/8, 9/5, 10/10, 11/7, 12/12

Leap 1/4, 2/29, 4/4, 5/9, 6/6, 7/11, 8/8, 9/5, 10/10, 11/7, 12/12

This trick works to help nail down a few holidays too.

July 4th being 1 week before 7/11, falls on that anchor day as well. Halloween too. Christmas and New Years, the day before.

Need to know what date is Labor Day, remember that 9/5 is an anchor day, so just work forward or backward from there.

https://www.timeanddate.com/date/doomsday-rule.html

Calendars also repeat

A helpful thing to know is that calendars repeat with a pattern. So it is possible to look back or forward in time to figure out what dates and days line up. This trick also would allow you to reuse calendars, even antique ones. I once saw this in a government office I visited, the person had an old calendar up on the wall that lined up right for the current year.

https://www.timeanddate.com/calendar/repeating.html

I worked a bit with ChatGPT to help me figure out the pattern. It took making it write some code examples, getting it wrong a bit, me explaining what is wrong and then then it was finally right. For me it was a good exercise in prompting it and getting it to nail the problem. I find it useful to have ChatGPT provide code examples as this makes a good ‘language’ within which I can communicate back and forth with it. It can be as simple as telling ti that the output is wrong and where it is wrong, up to a type of collaboration on the code.

I think it came up with a fair explanation which I have pasted in after the code snippet.

def find_years_with_same_calendar(base_year, num_years=10):
    years_with_same_calendar = []

    while len(years_with_same_calendar) < num_years:
        if is_same_calendar(base_year, 2023):
            years_with_same_calendar.append(base_year)
        base_year -= 1

    return years_with_same_calendar

def is_same_calendar(year1, year2):
    # Check if the days of the week for January 1st match in both years
    return day_of_week(year1, 1, 1) == day_of_week(year2, 1, 1)

def day_of_week(year, month, day):
    anchor_day = datetime.date(2023, 1, 1)
    target_day = datetime.date(year, month, day)
    delta = target_day - anchor_day
    days = delta.days
    return (days + 1) % 7

import datetime

# Example: Find 10 years with the same calendar as the year 2023
years_to_check = find_years_with_same_calendar(2023, num_years=10)
print(f"Years with the same calendar as 2023: {years_to_check}")

 

ChatGPT on Repeating Calendars

Certainly! The pattern you’ve described and that the corrected code now follows is based on finding years with the same calendar as a reference year (2023). This pattern is based on matching the day of the week for January 1st in various years to determine if they share the same calendar.

Here’s how the pattern works:

1. **Reference Year (2023)**: This is the year you’ve selected as your reference point.

2. **Matching the Day of the Week**: The pattern involves matching the day of the week for January 1st in other years with that of the reference year (2023).

3. **6-Year and 11-Year Cycles**: The pattern alternates between a 6-year cycle and an 11-year cycle. In the 6-year cycle, you go back in time by 6 years, and in the 11-year cycle, you go back by 11 years.

4. **Leap Year Influence**: Leap years, which occur every 4 years, can influence the pattern. When you encounter a leap year in the cycle, it might disrupt the alignment by adding an extra day (February 29). Therefore, you need to account for whether the year is a leap year to correctly match the pattern.

The pattern essentially finds years that, despite differences in their actual numerical values, share the same calendar in terms of the day of the week for January 1st. This pattern repeats as you move back in time, allowing you to identify years with the same calendar as your reference year.

For example, you can see that 2017 is 6 years before 2023, 1989 is 11 years before 2023, and 1961 is 11 years before 1989. The pattern repeats with adjustments for leap years and results in a sequence of years with the same calendar.