Category Archives: Machine Learning

Luna Rides Light Beam

January 17, 2023ChatGPTCat, Fantasy cat tale, Imagination, Light, Lightbeam, Neural Network, Storytelling, The tale of Lunaerickclasen

Can you write a story about a cat riding a beam of light. Use a writing style that is characterized by a narrative storytelling approach, use anecdotes and examples to illustrate the main points and also ask pretend questions about your own beliefs and assumptions, use a conversational tone and make it easy to follow your reasoning. Also use metaphors and comparisons.

Once upon a time, there was a cat named Luna who lived in a small village at the foot of a mountain. Luna was a curious and adventurous cat, always looking for new and exciting experiences.

One day, while exploring the mountain, Luna stumbled upon a strange, glowing beam of light. Without hesitation, she leapt onto the beam and began to ride it. The beam was like a magic carpet, lifting Luna higher and higher into the sky.

As Luna rode the beam, she saw things she had never seen before. The world below her became smaller and smaller, and she could see for miles and miles. She felt like she was flying, free as a bird. Luna felt a sense of wonder and amazement as she soared through the air.

As Luna rode the beam, she couldn’t help but think about the nature of light. She remembered her physics lessons and wondered if light was a particle or a wave. She wondered if the beam she was riding on was made up of photons, the fundamental particles of light. Luna also wondered if the beam she was riding on was infinite, and if there were other cats riding on beams of light somewhere in the universe.

But Luna quickly realized that these were just idle musings and that she should just enjoy the ride. The beam was like a river of light, and Luna was its fearless navigator. She rode the beam, feeling the wind in her fur, and the sun on her face.

Just as Luna was starting to feel like the beam would take her to the ends of the earth, it suddenly disappeared. Luna found herself back on the mountain, with the village below her. She realized that her adventure had come to an end, but the memories of it would stay with her forever.

From that day on, Luna had a new appreciation for the beauty and mystery of light. She would often return to the mountain and gaze up at the sky, searching for that magical beam of light once again.

And who knows? Maybe, just maybe, she would find it and ride it again, forever exploring the unknown and experiencing the wonder of the universe.

ChatGPT tackles dense research papers

January 13, 2023ChatGPTChatGPT, Kelly criterion, Monte Carlo, portfolio management, Python, riskerickclasen

Never underestimate life’s beauty.

ChatGPT can help to understand parts of a research paper that require clarification.

In this case the paper happens to be on applying the Kelly criterion to a portfolio of multiple assets. I was interested in the paper and read through it a few times and understand it pretty well as I have been studying this area recently, including reading Fortune’s Formula by William Poundstone which deals with the Kelly criterion. But, I wanted to make sure my understanding was concrete. I want to understand it in a way that I can compare what I personally have done in the way of setting up a portfolio. Specifically code that manages an index fund with risk management applied that operates similar the way the Kelly criterion works.

Electronic copy available at: http://ssrn.com/abstract=2259133Kelly criterion for multivariate portfolios: a model-free approach

The process involved to simplify text

By pasting in a section of the paper and asking it to dumb it down a bit and then interacting with it using some followup questions it can help clarify it. ChatGPT can be asked to bring it down to a high school level or alternatively you can tell it an age that you want the understanding to be brought down to.

In this case I also asked it to produce and example using Python code. The code is functional, however there is a few unused variables in it.

Transcript of the ‘conversation’ with ChatGPT

Explain the following in simpler terms, like high school level… Algorithm for the fractional Kelly strategies The strategies that bet less than the Kelly fraction are called fractional Kelly strategies. As we just pointed out, they are less profitable but also less risky. In the case of a univariate portfolio we just decrease the capital fraction invested in the risky asset and shift the released capital to the riskless bond. But for a multivariate portfolio such an approach is ambiguous. Indeed, an investor usually cares about the total fraction of risky assets in his/her portfolio. And s/he can at first proportionally descrease the fractions of all risky assets in u? but then additionally redistribute the capital among the risky assets. If the returns are jointly Gaussian then we shall just proportionally reduce the fractions of the risky assets. Moreover, in this case the strategy with Kelly fraction x := 1/(1 − α) is equivalent to the maximization of the (negative) power utility xα/α, α < 0. But in general this nice property does not hold true (see MacLean et al. [2010] and Davis and Lleo [2012]). For instance7, even if the marginal returns are Gaussian but the marginals are paired by Clayton copula, then the proportional fraction reduction is suboptimal. Imposing additional constraints 0 ≤ ui ≤ x ∀i and ∑n i=1 ui ≤ x we can still use (3) to find the optimal portfolio for a given x. However, (3) may be a bad approximation for the leveraged portfolios or the portfolios that contain assets with possibly very high returns8 (see Appendix 1). Additionally, practitioners usually like having two independent optimization methods: if both of them get similar results then they are plausible. We consider the portfolio optimization under the no leverage and no short selling constraints. A Monte Carlo grope algorithm we employ is as follows: 1. Assume we want to invest x of our capital in risky assets9, x ∈ [0%, 100%]. Choose an initial portfolio so that all fractions of the risky assets are between 0.0 and x and their sum is equal to x. Distributing the capital over all assets equally may be a good start. 2. (Groping step): modify the fractions at random so that the no short selling and no leverage conditions still hold. If the new portfolio yields higher expected growth rate, set it as initial portfolio, otherwise leave initial portfolio unchanged. 3. Repeat Step 2 until portfolio performance stops increasing.

The Kelly strategy is a way to invest money in different assets (like stocks or bonds) in order to make the most profit. The Kelly strategy says that you should invest a certain percentage of your money into each asset, based on how likely that asset is to make a profit. However, this strategy can be risky because if one of your assets doesn’t make as much money as you expected, you can lose a lot of your overall investment.

A fractional Kelly strategy is a way to make the Kelly strategy less risky by investing less money into each asset. For example, instead of following the Kelly strategy exactly, you might only invest 80% of the money that the Kelly strategy suggests. This means you’ll make less profit, but you’ll also be less likely to lose a lot of money.

When you’re using a fractional Kelly strategy for a portfolio (a collection of assets) that has multiple types of assets, it can be tricky to figure out how much to invest in each one. If all of the assets are likely to make similar profits, you can just decrease the percentage you’re investing in each one by the same amount. But if the assets have different levels of risk, you’ll need to adjust the percentages differently.

One way to do this is by using a computer program that simulates how different portfolios would perform. The program starts with a random portfolio and then makes small changes to it, to see if it will make more profit. If the new portfolio is better, the program switches to it and continues making changes. If the portfolio stops improving, the program stops.

This process is called a “Monte Carlo simulation” because it uses random numbers and works like a game of chance. By doing this simulation many times, it will get a portfolio that has a better chance to make a better return. The optimization process will be constraint by no leverage and no short selling, and the final result will be the optimal portfolio.

Would it be possible to use historic data or some kind of model data?

Yes, it is possible to use historic data or model data when using a fractional Kelly strategy. In fact, using historical data is often necessary in order to estimate the expected returns and risks of different assets.

One way to use historic data is by looking at the past performance of the assets. By analyzing the returns of an asset over a certain period of time, you can estimate the expected return and risk of investing in that asset. This information can then be used as input for the simulation algorithm.

Another way is to use financial models to estimate the expected returns and risks of different assets. These models use mathematical equations to describe how the assets should behave based on certain factors, such as economic conditions, market trends, and other variables. By plugging in the relevant information into the model, you can get estimates of the expected returns and risks of investing in different assets. This information can also be used as an input for the simulation algorithm

It’s important to note that, regardless of the input, the simulation and optimization algorithm will still work by exploring different portfolio scenarios under the constraints and looking for the one that will generate a higher expected return.

When working with historical data I assume it is going to be simulating trades.

Yes, when working with historical data, it is common to use it to simulate trades and backtest the strategy.

Backtesting is the process of applying a trading strategy to historical data to see how it would have performed in the past. By simulating trades on historical data, you can get an idea of how the strategy would have performed in the past and evaluate the strategy’s performance.

When you’re backtesting a fractional Kelly strategy, you would use historical data on the returns and risks of the assets in your portfolio to estimate the expected returns and risks. You would then use the simulation algorithm to generate different portfolios and apply the fractional Kelly strategy to them. The simulated trades will be compared to actual historical returns and the strategy performance will be evaluated.

It’s important to remember that past performance does not guarantee future results and that backtesting does not take into account all the risks of real-world trading. However, backtesting allows you to check how the strategy behaves over the course of history and how it could have performed if it would have been used in the past. This can give you an idea of how the strategy might perform in the future, and whether it’s worth implementing in a live trading scenario.

So, in essence it is using backtesting on set of assets and adjusting the fraction of each of them up and down randomly to find the best overall gains of the entire portfolio.

Yes, that’s correct! Essentially, backtesting a fractional Kelly strategy involves using historical data on the returns and risks of the assets in your portfolio, and then using a simulation algorithm to generate different portfolios. The algorithm will make random adjustments to the fraction of capital invested in each asset and will evaluate the expected return of each portfolio generated. The goal of this process is to find the portfolio that will generate the best overall gains by adjusting the fraction of capital invested in each assets, that is the one that will have the highest expected return.

In practice, the algorithm will explore the portfolio space by making random changes to the portfolio, according to the constraints and the Kelly strategy, and it will keep track of the best portfolio so far. The algorithm will repeat this process multiple times until it reaches a predefined criteria to stop, such as reaching a certain number of iterations or reaching a level of portfolio performance.

It’s worth noting that the algorithm can explore a large number of different portfolio combinations and therefore it can be a computationally expensive process. But, it allows to find the optimal portfolio given the conditions of the assets, constraints, and the Kelly strategy, and it can also provides insights on how the assets are behaving together.

If the algorithm finds the best returns isn’t this equivalent to the Kelly strategy.

Not exactly, while the fractional Kelly strategy and the algorithm I’ve described are related, they are not the same thing.

The Kelly strategy is a mathematical formula that tells you the optimal percentage of your capital to invest in each asset, given the expected returns and risks of the assets. The formula is based on the idea that you want to maximize your long-term growth rate, and it assumes that you have a perfect estimate of the expected returns and risks of each asset.

The algorithm I described, on the other hand, is a process for finding a good portfolio using the constraints and the Kelly strategy, but it doesn’t follow the formula exactly. Instead, it uses historical data and simulations to estimate the expected returns and risks of the assets, and it explores different portfolio combinations to find the one that generates the highest return. It may find a portfolio that is similar or equivalent to what the Kelly formula would suggest, but it is not guaranteed to do so.

It’s worth noting that the Kelly strategy can be challenging to implement in practice because of the difficulty to estimate the expected return and risk for each asset and because of the Kelly assumption that an individual can reinvest the profits and have no constraints on the bet size. On the other hand, the fractional Kelly algorithm is a more practical approach that can handle different constraints and real-world scenarios, such as leverage and short selling constraints.

Is there a simple example that can be shown with Python code?

Here is a simple example of how you might use Python code to backtest a fractional Kelly strategy:

import numpy as np

# Example data: historical returns and volatilities of three assets
returns = np.array([[0.02, 0.03, 0.01], [0.05, 0.02, 0.04], [0.04, 0.06, 0.03]])
volatilities = np.array([[0.01, 0.02, 0.005], [0.02, 0.01, 0.03], [0.01, 0.03, 0.02]])

# Initialize portfolio with equal fractions of capital invested in each asset
portfolio = np.array([1/3, 1/3, 1/3])

# Set the Kelly fraction (fraction of capital to invest in risky assets)
kelly_fraction = 0.8

# Set the number of simulations to run
num_simulations = 1000

# Set the fractional Kelly fraction
frac_kelly = 0.6

# Initialize a variable to keep track of the maximum portfolio return
best_return = -np.inf

# Run the simulations
for i in range(num_simulations):
    # Randomly adjust the portfolio fractions
    new_portfolio = portfolio + np.random.normal(size=3) * 0.01
    # Make sure the fractions are between 0 and the Kelly fraction
    new_portfolio = np.maximum(new_portfolio, 0)
    new_portfolio = np.minimum(new_portfolio, frac_kelly)
    # Make sure the fractions sum to 1
    new_portfolio = new_portfolio / new_portfolio.sum()
    # Calculate the expected return and volatility of the new portfolio
    expected_return = (new_portfolio * returns).sum()
    expected_volatility = (new_portfolio * volatilities).sum()
    # Update the best portfolio if the new one has a higher return
    if expected_return > best_return:
        portfolio = new_portfolio
        best_return = expected_return

# Print the optimal portfolio fractions
print("Optimal portfolio fractions:", portfolio)

This code is a simple example that shows how you can use Python to backtest a fractional Kelly strategy. The example uses historical returns and volatilities of three assets. The code set the Kelly fraction to 0.8 and the fractional Kelly fraction to 0.6 and runs the simulation 1000 times.

It starts with an initial portfolio that has equal fractions of capital invested in each asset, and then makes small random changes to the portfolio in each simulation. It then checks whether the new portfolio has a higher expected return and if so, it keeps the new portfolio and continue the process. After all the simulations are done, the optimal portfolio fractions are printed.

Keep in mind that this is a very simple example, and in practice, you would need to use more sophisticated methods to estimate the expected returns and risks of the assets, and you would need to use more realistic constraints. Additionally, you would want to test the performance of the algorithm with a variety of scenarios and parameters to validate the strategy.

Is it possible to use a genetic algorithm for this instead of Monte Carlo simulation?

Yes, it is possible to use a genetic algorithm for backtesting a fractional Kelly strategy instead of a Monte Carlo simulation. A genetic algorithm is a type of optimization algorithm that is inspired by the process of natural selection.

Just like the Monte Carlo simulation algorithm that I described earlier, a genetic algorithm would be used to explore the portfolio space and find the portfolio that generates the highest expected return. However, instead of making random changes to the portfolio, a genetic algorithm uses the principles of natural selection to evolve a population of portfolios over time.

Here’s a general overview of how a genetic algorithm for backtesting a fractional Kelly strategy might work:

Start with an initial population of portfolios that are randomly generated and satisfy the constraints of the problem.
Evaluate the fitness of each portfolio in the population. This is usually done by calculating the expected return of each portfolio and use it as the fitness value
Select the best portfolios from the population to “breed” new portfolios. The selection process is usually based on the fitness value of each portfolio.
“Breed” new portfolios by combining the characteristics of the selected portfolios through crossover and mutation operators.
Repeat the process of selection, breeding, and mutation until the performance of the population reaches a predefined stopping criteria.
Return the best portfolio of the final population.

A genetic algorithm can be a powerful optimization technique, and it has the ability to find globally optimal solutions to problems that are difficult to solve using other methods. However, it also has its limitations, such as the difficulty of determining the optimal configuration of parameters and the risk of getting stuck in local optima.

It is also worth noting that, similar to the Monte Carlo simulation, a genetic algorithm would still need to estimate the expected return and risk of each assets, and it would also need to be validated and tested thoroughly with a variety of scenarios and parameters before it is used in a live trading scenario.

Generating Code With ChatGPT

January 10, 2023ChatGPTChatGPT, Python, sqrterickclasen

ChatGPT can generate code by asking for it using plain English. Here are a few examples easy and some harder where it generates code to get the square root of two. It fails to do it via a genetic algorithm although the form of the code is not too bad and looks like it could be debugged and modified to work. For a different spin I asked it to do it using BASH script as well. ChatGPT has some ways to go but it is a world ahead of GPT3.0,2.0 and other generative text models I have used over the last 5 years.

Can you write Python code to get the square root of two?

Certainly! Here is some Python code that you can use to calculate the square root of two:

 import math

# Calculate the square root of two
square_root_of_two = math.sqrt(2)

# Print the result
print(square_root_of_two)

This code uses the sqrt() function from the math module to calculate the square root of a number. In this case, the number is 2, so the code will calculate the square root of two.

The sqrt() function returns the result as a floating point number, which is a type of data that can represent decimal values. The code then prints the result using the print() function, which will output the square root of two to the console.

I hope this helps! Let me know if you have any questions or if you need further assistance.

Can you redo the code using Newton’s method?

Certainly! Here is some Python code that uses Newton’s method to calculate the square root of two:

 # Function to calculate the square root using Newton's method
def square_root(x, guess=1.0):
    # Calculate the next guess using Newton's method
    next_guess = (guess + x / guess) / 2
    
    # Check if the guess is close enough to the actual value
    if abs(next_guess - guess) < 0.00001:
        # If it is, return the result
        return next_guess
    else:
        # If it's not, use the next guess as the new starting point and call the function again
        return square_root(x, next_guess)

# Calculate the square root of two
square_root_of_two = square_root(2)

# Print the result
print(square_root_of_two)

This code defines a function called square_root() that takes two arguments: the number for which you want to calculate the square root (x), and an initial guess for the square root (guess). The function uses Newton’s method to iteratively improve the guess until it is close enough to the actual value, at which point it returns the result.

To calculate the square root of two, you can call the square_root() function with the number 2 as the first argument. The function will then return the square root of two as a floating point number, which you can then print to the console using the print() function.

I hope this helps! Let me know if you have any questions or if you need further assistance.

Can you make Python code using recursion to calculate the square root of two?

Certainly! Here is a Python function that uses recursion to calculate the square root of two:

 def sqrt(x, guess=1.0):
    if abs(guess**2 - x) < 0.00001:
        return guess
    else:
        return sqrt(x, (guess + x/guess)/2)

result = sqrt(2)
print(result)

This function uses the Newton-Raphson method to calculate the square root of a given number x. The function takes an optional initial guess for the square root, which is used as the starting point for the iterative process. The function then calculates a new guess using the formula (guess + x/guess)/2, which is derived from the equation f(guess) = guess^2 - x = 0. The function then checks whether the difference between the new guess and the true value of the square root is small enough, and if it is, it returns the guess. Otherwise, it calls itself with the new guess and continues the iterative process.

This function will continue to iterate until it finds a guess that is close enough to the true value of the square root, at which point it will return the final guess. The function will terminate when the difference between the guess and the true value is small enough, as determined by the abs(guess**2 - x) < 0.00001 condition.

You can try calling the sqrt function with different values of x to see how it works. For example, to calculate the square root of two, you can call sqrt(2).

Can you make Python code to calculate the square root of two using machine learning?

Certainly! Here is a Python function that uses a machine learning model to calculate the square root of two:

 import numpy as np
from sklearn.linear_model import LinearRegression

def sqrt(x):
    # Create training data
    X = np.array([1, 2, 3, 4, 5]).reshape(-1, 1)
    y = np.array([1, 1.41421356, 1.73205081, 2, 2.23606798])
    
    # Create and fit the model
    model = LinearRegression()
    model.fit(X, y)
    
    # Use the model to predict the square root of x
    return model.predict([[x]])[0]

result = sqrt(2)
print(result)

This function uses a linear regression model from the sklearn library to predict the square root of a given number x. The model is trained on a dataset of input-output pairs, where the input is a number and the output is its square root. The model is then used to predict the square root of x by calling the predict method and passing it the value of x as an input.

To use this function, you will need to install the numpy and sklearn libraries

Can you calculate the square root of two using a bash script?

Certainly! Here is a bash script that calculates the square root of two using the bc command-line calculator:

 #!/bin/bash

# Calculate the square root of 2 using the bc calculator
sqrt=$(echo "sqrt(2)" | bc -l)

# Print the result
echo "The square root of 2 is: $sqrt"

To run this script, save it to a file and make it executable using the chmod command:

 chmod +x sqrt.sh

Then you can run the script by calling it:

 ./sqrt.sh

This script uses the bc command-line calculator to calculate the square root of two by passing the expression sqrt(2) to the bc command and using the -l flag to specify that it should use the built-in math library. The result is stored in the sqrt variable and is then printed to the console.

The bc command is a powerful calculator that can perform a wide range of mathematical operations, including arithmetic, algebra, trigonometry, and more. It is often used in bash scripts to perform calculations and manipulate numerical data.

Non Functional code was produced when it was asked to generate the code to do sqrt of 2 using a genetic algorithm. I had asked it to refine code that it generated that ‘ran off the page’ to shorten it. It removed some comments and streamlined the code as well.

Certainly! Here is a shorter version of the Python code that uses a genetic algorithm to find the square root of two:

 import random

def fitness(individual):
    return 1/abs(individual - 2**0.5)

def create_population(size):
    return [random.random() for _ in range(size)]

def select_parent(population, tournament_size):
    random.shuffle(population)
    return max(population[:tournament_size], key=fitness)

def reproduce(parent1, parent2):
    split_point = random.randint(1, len(parent1)-1)
    return parent1[:split_point] + parent2[split_point:]

def mutate(individual):
    mutate_index = random.randint(0, len(individual)-1)
    mutate_value = random.random()
    return individual[:mutate_index] + [mutate_value] + individual[mutate_index+1:]

def evolve(population, retain_percentage, random_select_percentage, mutation_rate):
    retain_length = int(len(population) * retain_percentage)
    random_length = int(len(population) * random_select_percentage)
    sorted_population = sorted(population, key=fitness, reverse=True)
    parents = sorted_population[:retain_length]
    for individual in sorted_population[retain_length:retain_length+random_length]:
        if random.random() < mutation_rate:
            parents.append(individual)
    parents_length = len(parents)
    desired_length = len(population) - parents_length
    children = []
    while len(children) < desired_length:
        parent1 = select_parent(population, tournament_size=5)
        parent2 = select_parent(population, tournament_size=5)
        child = reproduce(parent1, parent2)
        if random.random() < mutation_rate:
            child = mutate(child)
        children.append(child)
    parents.extend(children)
    return parents

def run_genetic_algorithm(population_size=100, generations=100, retain_percentage=0.2, random_select_percentage=0.05, mutation_rate=0.01):
    population = create_population(population_size)
    for _ in range(generations):
        population = evolve(population, retain_percentage, random_select_percentage, mutation_rate)
    return max(population, key=fitness)

result = run_genetic_algorithm()
print(result)

This code uses a genetic algorithm to search for the square root of two by generating a population of possible values and iteratively improving them through reproduction, mutation, and selection. The fitness function is used to evaluate the quality of each individual, and the genetic algorithm uses this function to guide the search towards better solutions.

Chicken Tenders Monetary Standard

January 3, 2023ChatGPT, MoneyChatGPT, chicken tenders, Jordan Peterson, Lex Fridman, Saifedean Ammous, stablecoin, standard, The Bitcoin Standarderickclasen

Really? Not sure on this idea, so let’s ask ChatGPT

Could chicken tenders be used as a standard?

Chicken tenders, or strips of chicken breast that are breaded and fried, could not be used as a standard for currency or any other type of financial system.

There are several reasons why chicken tenders (or any other type of food) could not be used as a standard. One reason is that they do not have a stable value. The value of chicken tenders (or any other type of food) can fluctuate based on a variety of factors, including supply and demand, production costs, and market conditions. This makes them an unreliable store of value and an unreliable medium of exchange.

Another reason is that chicken tenders (or any other type of food) do not have a widely recognized value or a widely accepted means of exchange. Unlike fiat currencies, which are issued by governments and accepted as a means of payment in most countries, chicken tenders (or any other type of food) are not recognized as a legitimate form of payment and are not widely accepted as a medium of exchange.

Finally, chicken tenders (or any other type of food) are subject to spoilage and deterioration, which makes them a poor choice as a standard for currency or any other type of financial system.

Overall, it is not practical or feasible to use chicken tenders (or any other type of food) as a standard for currency or any other type of financial system.

Create a stable coin based on chicken tenders.

It would not be practical or feasible to create a stablecoin based on chicken tenders. While it is theoretically possible to create a stablecoin based on any type of asset, including chicken tenders, there are several reasons why it would not be practical or feasible in this case.

One reason is that chicken tenders (or any other type of food) do not have a stable value. The value of chicken tenders (or any other type of food) can fluctuate based on a variety of factors, including supply and demand, production costs, and market conditions. This makes them an unreliable store of value and an unreliable medium of exchange, which would make it difficult to maintain the stability of a stablecoin based on chicken tenders.

Finally, chicken tenders (or any other type of food) are subject to spoilage and deterioration, which would make it difficult to maintain a stablecoin based on chicken tenders over the long term.

Overall, it is not practical or feasible to create a stablecoin based on chicken tenders or any other type of food.

On a more serious note check out The Bitcoin Standard, which might actually work.

Also there are interviews ( by Jordan Peterson and Lex Fridman ) with Saifedean Ammous an Austrian economist who wrote, The Bitcoin Standard: The Decentralized Alternative to Central Banking, see this post … http://erick.heart-centered-living.org/info-links/jordan-peterson-bitcoin-interview/

How to Generate Art Demo Command Line Version

March 4, 2020Image Generation, Machine Learninganime, art, How to Generate Art Demo, Machine Learning, Siraj Raval, WaveErick Clasen

This post covers code that I have played with that applies a style to an image. The code is derived from Siraj Raval’s work. I was interested in having a version that would run outside of a Python Jupyter notebook. The idea is to have a version that can be run from the command line. This allows for it to be run ‘production’ style. This means have it set with images and parameters and run right from the command line. I found a way to convert the code from Jupyter notebook code to straight Python. The exact method I can’t remember right now. The code also produces intermediate frames while running, one per iteration. These are stored in the iters directory. They can be used to monitor the models progress and optionally generate an animated GIF showing the unfolding of the process.

I have conda installed on my machine, so the dependency aka, requirements text file was used to load dependencies, following setup of the environment. This virtual environment makes sure that there are no collisions with other code environments, including the base. I did not want to upset anything that I depend on over an experiment. It can also be put under a virtual environment in Python. This is not a requirement but, good practice when developing code. It helps to keep the code in its own confined space and makes things work in a repeatable way. Making it easy to replicate the work of others in a controlled way. Sometimes it helps to be able to troubleshoot issues by switching between environments too.

Conda Environment Setup

The best resource for learning to work with conda is the cheat sheet for conda

I set mine up to work like this …

#
# To activate this environment, use
#
#     $ conda activate art-demo
#
# To deactivate an active environment, use
#
#     $ conda deactivate

Python Virtual environment example , setup and activate…

virtualenv -p python $HOME/tmp/art-demo-venv/

 cd How-to-Generate-Art-Demo/
 source $HOME/tmp/art-demo-venv/bin/activate

To setup dependency list

After setting up the optional environment, one of the above, install dependencies.

 pip install -r requirements.txt

Not sure about this, might have been in place already, the following that is…

If it doesn’t exist, create a file called ~/.keras/keras.json and make sure it looks like the following:

{
    "image_dim_ordering": "tf",
    "epsilon": 1e-07,
    "floatx": "float32",
    "backend": "tensorflow"
}

Then the demo code can be run from the command line.

demo-256.py	2019-05-23 16:40	16K
demo-512.py	2019-06-07 11:42	16K
demo-1024.py

Memory

Bear in mind that bigger resolution values will take longer than small. The memory requirements increase fast with resolution increases. I was only able to run 256 resolution on a machine with 4GB. I also tried to make thumbnail size images but, the model seems to have issues with smaller images, it produced distorted results. With a machine that has 16GB of RAM, it was easy to create 512 and 1024 pixel square images. When in doubt, monitor memory usage via the top command. The giveaway is swap gets used heavily with a large image. Even with a decent SSD and it having swap mounted on it, a machine might grind down to terrible lags in response, if swap is used dynamically on a heavy basis . Then the option is just to kill the process and make sure RAM is de allocated. This is really the only issue with running this code, memory use. If you have enough, it’s load up the dependencies and go.

Code

The code for this project is in this directory…

http://erick.heart-centered-living.org/upload/code/How-to-Generate-Art-Demo/

It is currently ‘live’. What I mean by that is that it is a symlink to the working directory on my main machine that is rsync’d ( rsync options -avhLK) to it’s location online. So, it is subject to change, you may see things get added in, I may break links accidentally too, hopefully not. If something is obviously broken, shout out in a comment and I can fix it. I might eventually load code to Github as a static repository.

Note the images directory. Also under it is the styles directory. I have added a few more examples beyond what Siraj Raval had. Mostly downloaded from 4chan while scrolling through wallpaper and some random images that were collected on my machine from wherever. Plus a few that I made for doing references against a simple image, such as a red dot , red and black dot and grid lines.

Plain is a directory that contains the original Siraj Raval code for reference.

Images and styles, contain a sampling of images to try. In the code the image location for initial image and style image are hard coded. Look around line 37 to line 52 for…

content_image_path

and

style_image_path

and you will see examples that can be switched in and out via mask commenting the lines out. Change it as needed.Plus try your own for fun.

Adjust parameters if needed. Look for them around line 152 under the line with …

# In[51]:

Also around line 305 is the iterations variable which can be adjusted.

There are some samples of output that were generated in the top level directory. This is where the output is dumped as output.png. I just kept renaming the interesting images to useful names as I kept playing with various image and style combinations. You have to rename output.png as it will be overwritten when a new run occurs. It is possible to run rm ??output.png to delete them. It is possible to put this line in the script and activate. It is in the script but I have it commented out because I did not want to blast all of the output.png’s away when the gif is made.

iters directory contains animated gifs of the images and a BASH script that is used to make the animated gifs.

notes.txt was a sketch of of the names for the virtual environment and conda environment, setups on two different machines. I might add other info and issues I find in this file.

Example of styling an image

Real Person to style with anime — Real 3D Person 512×512 as input from images/anime

2D Anime person to style the image — Anime 2D Person for the styling image, from images/style/anime

To get my bearings straight on this code again, I ran an example to make sure I understand it, it still runs. Hopefully others can gain insight into what a proper run does.

Input Images

Above I have two images, one the initial image and the second the style. The initial and styling image are both cropped to be 512×512 pixels, re-scaling as needed. Re-scaling is not necessary, except for the fact that the program will re-scale and if the proportions are not square, the resultant image will be stretched or squished.

On the first one there was a bit of a haze of blue to the right that I bucket filled to white so it wouldn’t get styled as well. From extending the photo to a larger canvas to make square this blue haze was an obvious line of demarcation. Plus her shoulder was truncated, so I filled that in a bit to make things look better. The anime image for styling was just cropped square and re-scaled too and flipped. Not sure if it makes a difference but, I looked for images that were close-ish in look and pose and set the same orientation.

Memory Usage and Performance

My machine, an i5, Sandy Bridge OptiPlex-790, can be see taking around 500-700 seconds per epoch. It was also running other code at the time, tuning parameters for algo trading code, so all the cores were not focused on making the images. Plus, all the miscellaneous stuff running, mail, browser with too many tabs open, it is using 1/2 core just doing that. But, 20 loops through the code is not a long time, really. You can see below that a fair amount of memory is being used, including some swap. But, the machine is not slowing down at all. I think some of the swap is just used up on other things and may not have been de-allocated over time, so it’s more or less static swap now.

top - 11:49:25 up 118 days, 19:01, 13 users,  load average: 8.51, 8.57, 8.12
Tasks: 304 total,   3 running, 300 sleeping,   0 stopped,   1 zombie
%Cpu0  : 78.7 us,  9.3 sy,  0.0 ni, 12.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu1  : 73.0 us,  9.7 sy,  0.0 ni, 17.3 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu2  : 97.3 us,  2.7 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu3  : 80.9 us, 10.4 sy,  0.0 ni,  8.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem : 16309180 total,  6623860 free,  6149876 used,  3535444 buff/cache
KiB Swap: 16654332 total,  8917700 free,  7736632 used.  9330380 avail Mem

Iter Directory and Animated GIFs

While this code is running it will be copying png’s of the process into the iters directory. This lets you get a peek into the models progress and it is possible to use these later in conjunction with the make-gif-.sh in that directory to make an animated GIF of the model ‘building’ the final image. The copies starting from 00output.png and so on up to the last epoch number (The prefixing and order matter for the script to generate an animation) are a clue as to if things are going right. If after a few images, things look way off base or you question what direction the results are going, viewing the intermediate images give you a chance to abort early. There is a line in the file at the end that will optionally remove the output.png’s using rm ??output.png.

iterations of how to generate art demo — Iters Directory Showing Intermediate Copies

Command Line Output

From my command line this is what I see when it runs. Change to the directory, activate conda environment for the art-demo and fire off python demo-512.py. (I created another called demo-512-anime+human.py to replicate the example in this post ) Soon it spins up, well the fans on the machine do as well, Python grabs all the CPU power it can on all available cores so soon you will hear the power of machine learning as the fans kick to max RPM. Time goes by as the iterations of epochs run, 20 in this case. As long as the very high loss value is coming down, work is being done. As can be seen there is a point of diminishing returns beyond which not much is to be gained. This also can be assessed by looking at the progression of images in the iters directory. After it runs, it quits and a final output.png will appear in the top level, for me the ~/ml/How-to-Generate-Art-Demo/ directory, same directory the code is run from. Nothing special about ~/ml other than it is the place I keep all the machine learning code.

Note the warnings: I did not notice this right away but the Tensorflow install was not compiled to use SSE3,4.X and AVX. I retried a few loops of this example at 256×256 in my base install where Tensorflow seems to be compiled with these features and it yields 40% speed improvement.

(base) erick@OptiPlex-790 ~ $ cd ~/ml/How-to-Generate-Art-Demo/
(base) erick@OptiPlex-790 ~/ml/How-to-Generate-Art-Demo $ conda activate art-demo
(art-demo) erick@OptiPlex-790 ~/ml/How-to-Generate-Art-Demo $ python demo-512.py 
Using TensorFlow backend.
(1, 512, 512, 3)
(1, 512, 512, 3)
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
Start of iteration 0
Current loss value: 1.68822e+11
Iteration 0 completed in 511s
Start of iteration 1
Current loss value: 1.05267e+11
Iteration 1 completed in 480s
Start of iteration 2
Current loss value: 7.54607e+10
Iteration 2 completed in 459s
Start of iteration 3
Current loss value: 5.67734e+10
Iteration 3 completed in 489s
Start of iteration 4
Current loss value: 4.71848e+10
Iteration 4 completed in 518s
Start of iteration 5
Current loss value: 4.19099e+10
Iteration 5 completed in 513s
Start of iteration 6
Current loss value: 3.82277e+10
Iteration 6 completed in 533s
Start of iteration 7
Current loss value: 3.56423e+10
Iteration 7 completed in 508s
Start of iteration 8
Current loss value: 3.36591e+10
Iteration 8 completed in 498s
Start of iteration 9
Current loss value: 3.21125e+10
Iteration 9 completed in 441s
Start of iteration 10
Current loss value: 3.08934e+10
Iteration 10 completed in 610s
Start of iteration 11
Current loss value: 2.9882e+10
Iteration 11 completed in 516s
Start of iteration 12
Current loss value: 2.90331e+10
Iteration 12 completed in 495s
Start of iteration 13
Current loss value: 2.82984e+10
Iteration 13 completed in 499s
Start of iteration 14
Current loss value: 2.76652e+10
Iteration 14 completed in 495s
Start of iteration 15
Current loss value: 2.70985e+10
Iteration 15 completed in 530s
Start of iteration 16
Current loss value: 2.66107e+10
Iteration 16 completed in 86288s
Start of iteration 17
Current loss value: 2.61579e+10
Iteration 17 completed in 526s
Start of iteration 18
Current loss value: 2.57533e+10
Iteration 18 completed in 642s
Start of iteration 19
Current loss value: 2.53972e+10
Iteration 19 completed in 717s
(art-demo) erick@OptiPlex-790 ~/ml/How-to-Generate-Art-Demo $

The output file appears in the same directory

(art-demo) erick@OptiPlex-790 ~/ml/How-to-Generate-Art-Demo $ ls output.png 
output.png

Final Result

Final after 20 iterations Human + Anime (style image) = Humanime?

Animated GIF

To make an animated GIF from the output files in the iters directory. (Note, in Linux the program Convert must be installed if it is not already installed in the distribution that is being used). Go to the iters directory, make sure that there are no extraneous output.png files from previous runs. This can happen if a previous run was made that was longer, more iterations and generated numbered outputs above the last iteration. Then run…

(art-demo) erick@OptiPlex-790 ~/ml/How-to-Generate-Art-Demo $ cd iters
(art-demo) erick@OptiPlex-790 ~/ml/How-to-Generate-Art-Demo/iters $

(art-demo) erick@OptiPlex-790 ~/ml/How-to-Generate-Art-Demo/iters $ ./make-gif.sh

In a few seconds a file called animation.gif will appear. Rename the file if you want to keep it as any future work may write over it.

The code is well commented and easy to tweak, thanks to Siraj Raval. I will end with one part of the conclusion , it’s appropriate as a final word.

# It's now your turn to play! Try changing the input images, their sizes, the weights of the different loss functions, the features used to construct them and enjoy different sorts of output. If you end up creating something you truly wish to share, [please do so](https://twitter.com/copingbear)!

Resources

Original work is on GitHub provided by Siraj Raval

How-To-Generate-Art-Demo

Video Tutorial that accompanies GitHub Code

Original Paper

https://arxiv.org/pdf/1508.06576.pdf

trying out min-char-rnn and lstm

July 4, 2019Machine Learning, Text Generationchar-rnn, lstm, min-char-rnn, torch-rnnerickclasen

Text Generation

In early 2018, I started researching machine learning. I was curious about it and also looking for anything that could be useful in the space of machine learning to perform functions in code, specifically trading algorithms. I looked for code that would be easy to get started with, easy to pick apart and understand. I knew it would take time to understand and I was fine taking some sidetracks down some territory that would be interesting to play with.

I quickly came across various versions of text generation code. I won’t get into the theory here as there is a ton of information on it already and I have included my sources as links in the post.

Basically, this post will focus on the use of a well documented version char-rnn specifically min-char-rnn and an improved version that uses lstm.

This post is a brief dump on my toying around at text generation with machine learning. I may cover it further in more detail in the future.

To see some cutting edge examples of text generation, that is pretrained on a corpus using GPT-2, see the following post…

GPT-2 the Next Level in Text Generation for now at least

Currently I am trying out RNN Text Generation using Tensorflow and plan on posting some results in the future…

https://github.com/spiglerg/RNN_Text_Generation_Tensorflow

char-rnn code

I started out with Andrej Karpathy’s min-char-rnn initially when I was in machine learning research mode early in 2018. During the early fall of 2018 I found posts by Eli Bendersky that gave a good break down of min-char-rnn, including code that had more comments and a few other pieces of code. The two other pieces of code was a Markov Model text generator and lstm extension to min-char-rnn. In theory the lstm version of min-char-rnn should perform the best. I wound up with the best results (Python only code) using this version with modifications. The three modifications were the ability to save the output text while it was running and the ability to trim down the learning rate progressively while it was running and to specify a file name at the command line. I did this after I noticed that the loss was oscillating and not decreasing under certain conditions, playing with the coefficients for the code. Such as layers and the amount of lookback of the characters. Trying to decrease the learning rate can help with this behavior.

Beyond this code a more sophisticated way to do text generation is to go to using Lua Torch and running torch-rnn. This requires downloading and installing all of the packages in Linux required to run it.

Comparing the Versions

To start, min-char-rnn performance can be compared against the lstm version. In this example I am using the US Constitution as it is a document that is widely available and many people are familiar with. In the future I will cover running the text from this site through text generation code.

After 99000 loops of min-char-rnn on the US Constitution.

iter 99900, loss: 34.083777
 ----
  , Reach witle from to
 the Lice of shall degrest and unccive

athins or propinds, ploovate one shall eptemitlatiall un ligre shall have hake by the Ugiters shall no no be
 as
 writh two 2hs -quals and of

You can see it is trying to pick up on something and a few of the words are actually legitimate English.

Now the lstm version created by Eli Bendersky, minimal-character-based-lstm-implementation…

iter 99800 (p=18160), loss 9.844710
 ----
 shall receire ffour Houser, hred in overyof morty Cowcurthir
 such onf
 grate,
 shall then callary, and
 sittin dutler, shall, with an electors
 and Elections,
 which shall be a President.

3. Nuties, Impos

This version shows some hope as it is forming some more structure, better words and parts of sentences. The loss is lower as well.

What if the LTSM version runs longer, like a million cycles?

iter 999000 (p=17184), loss 4.694482
 iter 999200 (p=20384), loss 4.734232
 iter 999400 (p=23584), loss 4.815483
 iter 999600 (p=26784), loss 4.979268
 iter 999800 (p=29984), loss 5.165326
 ----
 shall consisted; but the Congress may by Congress, becommit,
 shall be a Senator or Representative intee any Department or Trust under the Laws Spojgiled to consirques hating been creary shall prioriti
 ----

It is getting a bit better, less broken words, formation of a sort of paragraph.

How about ten million cycles, where the loss seems to bottom out in the 2.5 range…

iter 10096200 (p=21776), loss 2.487097

iter 10096400 (p=24976), loss 2.517261

iter 10096600 (p=28176), loss 2.605424

iter 10096800 (p=31376), loss 2.556021

----

against the sements whereor who shall return in Consent whations whict: Amend

Mander State. a Treason of Disubility to lis arming the cume aftered thanney. Ir, or Conventions as the lise dusceptraray

I concatenated some more of the output….

to

post a Member in their borth intomie States, the Vice President and denuinned.

Amendment 10

The powers not the twerfth not betilizent of this article the Vicembagion and such Pentitias of the seve

s the United

Stated; under their Party; without the

United States.

Article 36. When mademe Court of the United States which shall not be retsion age State. Andain duty a stanly such Majority, or the

ited States or by any

State.

No Prefered, the

President proviit for President; necestald a majority be a Members, and the Legitlationen for the law of the President, to be the Eduld a Memberd

to ever

ne of the seber to

the Approparal Becomes of Blesident within the United States un nunis primas to which the District hensbill, such Presented by incohringle shall be

tax having

States, and

transmit t

Modifications of min-char-lstm.py

The following was created by modifying the code at https://github.com/eliben/deep-learning-samples/blob/master/min-char-rnn/min-char-lstm.py

The modifications are, 1 to be able to allow for a learning rate that starts higher and declines. I was experimenting with using the min-char-ltsm.py code on the contents of the blog posts on this site and I noticed that the learning rate would decline to a point and then oscillate up and down. By starting with a higher learning rate and then trimming it lower and lower, I was hoping to get the oscillations to settle and achieve a better loss rate. The other modification is that it will not print text to the screen until the loss declines by half. It also saves the text that is generated after the loss drops to a quarter of the original. Saving the text allows for optional post analysis, for instance of keywords.

Code for min-char-lstm-mod-2.py is pasted at the bottom of this post.

python min-char-lstm-mod-2.py us-constitution.txt
2019-02-21 21:58:36.456591: iter 13238027 (p=48), loss 5.571070
----
  f Hear Porty, or Vice President equal, as
Vares possary, having to nimpost to the President, dofe, for the Senate, the first Meeting of the first Summarma
onle Jonn admius lesments shall exercise they shall not be consthuled for tainamanment mort himal of the President shall consist of a sever Years as of the
United States;

To recass on the Congress may be admints, Cans; the
proprictions herein becom tamy and
Partarittent; or transting Surdsation shall
immentent no State, or abilitives shall b 
----

Saving to file

2019-02-21 21:58:36.568285: iter 13238028 (p=64), loss 5.567595
----
 anstwry by such Vacarciuse
Amdiavion, or other Sestected, and the Congress shall may leibtry days by Contict Biltian convice Vith.

No case, dupa such Penisdatizens prsed the Bildsent of thein be cindent for sitt in Cases of President and Vice President, or altice any Office of either House shall be held in the Bilas, except prohicies and Consuls; to be senict
compected
in this Conments of Congress.

The executive atther during the right of the United States, shall choole the Office of Rewofity  
----

Saving to file

char-rnn – Training Loss and Validation Loss

char-rnn – Training Loss and Validation Loss from MachineLearning

Lua Torch

I also experimented with torch-rnn which use Lua Torch. It works OK but nothing beyond what I see with the lstm version above. I only tried it breifly and haven’t formed any solid conclustions.

Lua Torch torch-rnn 2 layer 1024 dimensions

https://github.com/jcjohnson/torch-rnn

st keming the Immedince shall
have Power of Not
shall be not
lations for each
shall any
State by the Judigaany state not provided by Casssimate and Repund and Jurtice of the Sequicled in the Unanimed as excleding recrisal of its Consent of the
Mevole shall then the Vice-President, and of the President shall be make Raycesorveny or the right noveranded.

E thas Deleste, by Odfect to the nomes of any Qerfon and ciry of the State

becredo nugr on Condeling and firmine who
haviny this Constitution, but no derso- hivin, Regulation of Vice-Une Tneasor
this BitFzinst entseived.
he fect Coof Presidences, nhish shall be agSent of the Treaso shall gave to behave to the States nor and devermay of the United States; Monor subrected, nor and during the Leecther Year
d aftee the Adjupreit, but in a Memualif or public Ministersss atcerrind ad any Piffost the States connicted to Them thind
ponted by the United States.

S. Pemsud for the
chosen thes shall be a particied and Hays. Wh labth the narge of the Senate, law, ablone Indianty for a dwoun the Eves Motors of liozen Deardors and Elestions and ass ow the Legislatures
shall nake at semoun shall not be require the sunes as ivaly, age unters; and necons al
witn from Oate Members, and accuration of time of titimes, inlarconcancrading one properdyy of the United States, or in which meat
male, in property, sian to the Person having anm notine

Immortizer of having the President.

th onothert commors and Consent, shall apr in this Conviction
may shall, Ligizen aplice—
B.

Smation C
qulication, the first Manger To
theresimant of a pripersonr
Thithit subject dot chimles Tnemeriting the several States, shall be shall be equal States, or in any other Election,
compensation, without the several States; the receita diforme, but nother shall detmanation shall not excerain their Vecessary sexzect juty, or puflis indey
they shall be neach or number in mate been courtion or execuin s co-venty shall not be consugheraty of the Scatem at shall h

min-char-lstm-mod-2.py

# Minimal character-based language model learning with an LSTM architecture.
#
# Overall code structure based on Andrej Karpathy's min-char-rnn model:
#    https://gist.github.com/karpathy/d4dee566867f8291f086
#
# But the architecture is modified to be LSTM rather than vanilla RNN.
# The companion blog post is:
#   https://eli.thegreenplace.net/2018/minimal-character-based-lstm-implementation/
#
# Tested with Python 3.6
#
# Eli Bendersky [http://eli.thegreenplace.net]
# BSD License per original (@karpathy)
from __future__ import print_function

import numpy as np
import sys
import datetime


# Make it possible to provide input file as a command-line argument; input.txt
# is still the default.
if len(sys.argv) > 1:
    filename = sys.argv[1]
else:
    filename = 'input.txt'

with open(filename, 'r') as f:
    data = f.read()

# All unique characters / entities in the data set.
chars = list(set(data))
data_size = len(data)
V = vocab_size = len(chars)
print('data has %d characters, %d unique.' % (data_size, vocab_size))

# Each character in the vocabulary gets a unique integer index assigned, in the
# half-open interval [0:N). These indices are useful to create one-hot encoded
# vectors that represent characters in numerical computations.
char_to_ix = {ch:i for i, ch in enumerate(chars)}
ix_to_char = {i:ch for i, ch in enumerate(chars)}
print('char_to_ix', char_to_ix)
print('ix_to_char', ix_to_char)

# Hyperparameters.

# Size of hidden state vectors; applies to h and c.
H = hidden_size = 100
seq_length = 16 # number of steps to unroll the LSTM for
learning_rate = 0.1

# The input x is concatenated with state h, and the joined vector is used to
# feed into most blocks within the LSTM cell. The combined height of the column
# vector is HV.
HV = H + V

# Stop when processed this much data
MAX_DATA = 1000000

# Model parameters/weights -- these are shared among all steps. Weights
# initialized randomly; biases initialized to 0.
# Inputs are characters one-hot encoded in a vocab-sized vector.
# Dimensions: H = hidden_size, V = vocab_size, HV = hidden_size + vocab_size
Wf = np.random.randn(H, HV) * 0.01
bf = np.zeros((H, 1))
Wi = np.random.randn(H, HV) * 0.01
bi = np.zeros((H, 1))
Wcc = np.random.randn(H, HV) * 0.01
bcc = np.zeros((H, 1))
Wo = np.random.randn(H, HV) * 0.01
bo = np.zeros((H, 1))
Wy = np.random.randn(V, H) * 0.01
by = np.zeros((V, 1))


def sigmoid(z):
    """Computes sigmoid function.

    z: array of input values.

    Returns array of outputs, sigmoid(z).
    """
    # Note: this version of sigmoid tries to avoid overflows in the computation
    # of e^(-z), by using an alternative formulation when z is negative, to get
    # 0. e^z / (1+e^z) is equivalent to the definition of sigmoid, but we won't
    # get e^(-z) to overflow when z is very negative.
    # Since both the x and y arguments to np.where are evaluated by Python, we
    # may still get overflow warnings for large z elements; therefore we ignore
    # warnings during this computation.
    with np.errstate(over='ignore', invalid='ignore'):
        return np.where(z >= 0,
                        1 / (1 + np.exp(-z)),
                        np.exp(z) / (1 + np.exp(z)))


def lossFun(inputs, targets, hprev, cprev):
    """Runs forward and backward passes through the RNN.

      TODO: keep me updated!
      inputs, targets: Lists of integers. For some i, inputs[i] is the input
                       character (encoded as an index into the ix_to_char map)
                       and targets[i] is the corresponding next character in the
                       training data (similarly encoded).
      hprev: Hx1 array of initial hidden state
      cprev: Hx1 array of initial hidden state

      returns: loss, gradients on model parameters, and last hidden states
    """
    # Caches that keep values computed in the forward pass at each time step, to
    # be reused in the backward pass.
    xs, xhs, ys, ps, hs, cs, fgs, igs, ccs, ogs = (
            {}, {}, {}, {}, {}, {}, {}, {}, {}, {})

    # Initial incoming states.
    hs[-1] = np.copy(hprev)
    cs[-1] = np.copy(cprev)

    loss = 0
    # Forward pass
    for t in range(len(inputs)):
        # Input at time step t is xs[t]. Prepare a one-hot encoded vector of
        # shape (V, 1). inputs[t] is the index where the 1 goes.
        xs[t] = np.zeros((V, 1))
        xs[t][inputs[t]] = 1

        # hprev and xs[t] are column vector; stack them together into a "taller"
        # column vector - first the elements of x, then h.
        xhs[t] = np.vstack((xs[t], hs[t-1]))

        # Gates f, i and o.
        fgs[t] = sigmoid(np.dot(Wf, xhs[t]) + bf)
        igs[t] = sigmoid(np.dot(Wi, xhs[t]) + bi)
        ogs[t] = sigmoid(np.dot(Wo, xhs[t]) + bo)

        # Candidate cc.
        ccs[t] = np.tanh(np.dot(Wcc, xhs[t]) + bcc)

        # This step's h and c.
        cs[t] = fgs[t] * cs[t-1] + igs[t] * ccs[t]
        hs[t] = np.tanh(cs[t]) * ogs[t]

        # Softmax for output.
        ys[t] = np.dot(Wy, hs[t]) + by
        ps[t] = np.exp(ys[t]) / np.sum(np.exp(ys[t]))

        # Cross-entropy loss.
        loss += -np.log(ps[t][targets[t], 0])

    # Initialize gradients of all weights/biases to 0.
    dWf = np.zeros_like(Wf)
    dbf = np.zeros_like(bf)
    dWi = np.zeros_like(Wi)
    dbi = np.zeros_like(bi)
    dWcc = np.zeros_like(Wcc)
    dbcc = np.zeros_like(bcc)
    dWo = np.zeros_like(Wo)
    dbo = np.zeros_like(bo)
    dWy = np.zeros_like(Wy)
    dby = np.zeros_like(by)

    # Incoming gradients for h and c; for backwards loop step these represent
    # dh[t] and dc[t]; we do truncated BPTT, so assume they are 0 initially.
    dhnext = np.zeros_like(hs[0])
    dcnext = np.zeros_like(cs[0])

    # The backwards pass iterates over the input sequence backwards.
    for t in reversed(range(len(inputs))):
        # Backprop through the gradients of loss and softmax.
        dy = np.copy(ps[t])
        dy[targets[t]] -= 1

        # Compute gradients for the Wy and by parameters.
        dWy += np.dot(dy, hs[t].T)
        dby += dy

        # Backprop through the fully-connected layer (Wy, by) to h. Also add up
        # the incoming gradient for h from the next cell.
        dh = np.dot(Wy.T, dy) + dhnext

        # Backprop through multiplication with output gate; here "dtanh" means
        # the gradient at the output of tanh.
        dctanh = ogs[t] * dh
        # Backprop through the tanh function; since cs[t] branches in two
        # directions we add dcnext too.
        dc = dctanh * (1 - np.tanh(cs[t]) ** 2) + dcnext

        # Backprop through multiplication with the tanh; here "dhogs" means
        # the gradient at the output of the sigmoid of the output gate. Then
        # backprop through the sigmoid itself (ogs[t] is the sigmoid output).
        dhogs = dh * np.tanh(cs[t])
        dho = dhogs * ogs[t] * (1 - ogs[t])

        # Compute gradients for the output gate parameters.
        dWo += np.dot(dho, xhs[t].T)
        dbo += dho

        # Backprop dho to the xh input.
        dxh_from_o = np.dot(Wo.T, dho)

        # Backprop through the forget gate: sigmoid and elementwise mul.
        dhf = cs[t-1] * dc * fgs[t] * (1 - fgs[t])
        dWf += np.dot(dhf, xhs[t].T)
        dbf += dhf
        dxh_from_f = np.dot(Wf.T, dhf)

        # Backprop through the input gate: sigmoid and elementwise mul.
        dhi = ccs[t] * dc * igs[t] * (1 - igs[t])
        dWi += np.dot(dhi, xhs[t].T)
        dbi += dhi
        dxh_from_i = np.dot(Wi.T, dhi)

        dhcc = igs[t] * dc * (1 - ccs[t] ** 2)
        dWcc += np.dot(dhcc, xhs[t].T)
        dbcc += dhcc
        dxh_from_cc = np.dot(Wcc.T, dhcc)

        # Combine all contributions to dxh, and extract the gradient for the
        # h part to propagate backwards as dhnext.
        dxh = dxh_from_o + dxh_from_f + dxh_from_i + dxh_from_cc
        dhnext = dxh[V:, :]

        # dcnext from dc and the forget gate.
        dcnext = fgs[t] * dc

    # Gradient clipping to the range [-5, 5].
    for dparam in [dWf, dbf, dWi, dbi, dWcc, dbcc, dWo, dbo, dWy, dby]:
        np.clip(dparam, -5, 5, out=dparam)

    return (loss, dWf, dbf, dWi, dbi, dWcc, dbcc, dWo, dbo, dWy, dby,
            hs[len(inputs)-1], cs[len(inputs)-1])


def sample(h, c, seed_ix, n):
    """Sample a sequence of integers from the model.

    Runs the LSTM in forward mode for n steps; seed_ix is the seed letter for
    the first time step, h and c are the memory state. Returns a sequence of
    letters produced by the model (indices).
    """
    x = np.zeros((V, 1))
    x[seed_ix] = 1
    ixes = []

    for t in range(n):
        # Run the forward pass only.
        xh = np.vstack((x, h))
        fg = sigmoid(np.dot(Wf, xh) + bf)
        ig = sigmoid(np.dot(Wi, xh) + bi)
        og = sigmoid(np.dot(Wo, xh) + bo)
        cc = np.tanh(np.dot(Wcc, xh) + bcc)
        c = fg * c + ig * cc
        h = np.tanh(c) * og
        y = np.dot(Wy, h) + by
        p = np.exp(y) / np.sum(np.exp(y))

        # Sample from the distribution produced by softmax.
        #ix = np.random.choice(range(V), p=p.ravel())
    # IX HACK
        ix = p.argmax()
    x = np.zeros((V, 1))
        x[ix] = 1
        ixes.append(ix)
    return ixes


def gradCheck(inputs, targets, hprev, cprev):
    global Wf, Wi, bf, bi, Wcc, bcc, Wo, bo, Wy, by
    num_checks, delta = 10, 1e-5
    (_, dWf, dbf, dWi, dbi, dWcc, dbcc, dWo, dbo, dWy, dby,
     _, _) = lossFun(inputs, targets, hprev, cprev)
    for param, dparam, name in zip(
            [Wf, bf, Wi, bi, Wcc, bcc, Wo, bo, Wy, by],
            [dWf, dbf, dWi, dbi, dWcc, dbcc, dWo, dbo, dWy, dby],
            ['Wf', 'bf', 'Wi', 'bi', 'Wcc', 'bcc', 'Wo', 'bo', 'Wy', 'by']):
        assert dparam.shape == param.shape
        print(name)
        for i in range(num_checks):
            ri = np.random.randint(0, param.size)
            old_val = param.flat[ri]
            param.flat[ri] = old_val + delta
            numloss0 = lossFun(inputs, targets, hprev, cprev)[0]
            param.flat[ri] = old_val - delta
            numloss1 = lossFun(inputs, targets, hprev, cprev)[0]
            param.flat[ri] = old_val # reset
            grad_analytic = dparam.flat[ri]
            grad_numerical = (numloss0 - numloss1) / (2 * delta)
            if grad_numerical + grad_analytic == 0:
                rel_error = 0
            else:
                rel_error = (abs(grad_analytic - grad_numerical) /
                             abs(grad_numerical + grad_analytic))
            print('%s, %s => %e' % (grad_numerical, grad_analytic, rel_error))


def basicGradCheck():
    inputs = [char_to_ix[ch] for ch in data[:seq_length]]
    targets = [char_to_ix[ch] for ch in data[1:seq_length+1]]
    hprev = np.random.randn(H, 1)
    cprev = np.random.randn(H, 1)
    gradCheck(inputs, targets, hprev, cprev)

# Uncomment this to run gradient checking instead of training
#basicGradCheck()
#sys.exit()

# n is the iteration counter; p is the input sequence pointer, at the beginning
# of each step it points at the sequence in the input that will be used for
# training this iteration.
n, p = 0, 0

# Memory variables for Adagrad.
mWf = np.zeros_like(Wf)
mbf = np.zeros_like(bf)
mWi = np.zeros_like(Wi)
mbi = np.zeros_like(bi)
mWcc = np.zeros_like(Wcc)
mbcc = np.zeros_like(bcc)
mWo = np.zeros_like(Wo)
mbo = np.zeros_like(bo)
mWy = np.zeros_like(Wy)
mby = np.zeros_like(by)
smooth_loss = -np.log(1.0/V) * seq_length
best_loss = smooth_loss
# Save the initial loss so that printing and saving occur at 1/2 of it and 1/10 of it.
start_loss = smooth_loss
output_filename = "lstm-output.txt"

print("\nStart Loss:",start_loss)



while p < MAX_DATA:
    # Prepare inputs (we're sweeping from left to right in steps seq_length long)
    if p+seq_length+1 >= len(data) or n == 0:
        # Reset RNN memory
        hprev = np.zeros((H, 1))
        cprev = np.zeros((H, 1))
        p = 0 # go from start of data

    # In each step we unroll the RNN for seq_length cells, and present it with
    # seq_length inputs and seq_length target outputs to learn.
    inputs = [char_to_ix[ch] for ch in data[p:p+seq_length]]
    targets = [char_to_ix[ch] for ch in data[p+1:p+seq_length+1]]

    # Sample from the model now and then.
#    if n % 1000 == 0:
#        sample_ix = sample(hprev, cprev, inputs[0], 200)
#        txt = ''.join(ix_to_char[ix] for ix in sample_ix)
#        print('----\n %s \n----' % (txt,))

    # Forward seq_length characters through the RNN and fetch gradient.
    (loss, dWf, dbf, dWi, dbi, dWcc, dbcc, dWo, dbo, dWy, dby,
     hprev, cprev) = lossFun(inputs, targets, hprev, cprev)
    smooth_loss = smooth_loss * 0.999 + loss * 0.001
#    if n % 200 == 0:
#        print('iter %d (p=%d), loss %f' % (n, p, smooth_loss))

    # Sample from the model now and then.
    if smooth_loss > (start_loss/4):
        if n % 1000 == 0:
            print('%s: iter %d (p=%d), loss %f' % (datetime.datetime.now(),n, p, smooth_loss))
    elif smooth_loss < best_loss:
        print('%s: iter %d (p=%d), loss %f' % (datetime.datetime.now(),n, p, smooth_loss))
    best_loss = smooth_loss
        sample_ix = sample(hprev, cprev, inputs[0], 500)
        txt = ''.join(ix_to_char[ix] for ix in sample_ix)
        print('----\n %s \n----' % (txt,))
    
    if smooth_loss < (start_loss/6):
        print("\nSaving to file\n")
            with open(output_filename,'a') as file_object:
                file_object.write(txt)
                    file_object.write("\n")


    # Perform parameter update with Adagrad.
    for param, dparam, mem in zip(
            [Wf, bf, Wi, bi, Wcc, bcc, Wo, bo, Wy, by],
            [dWf, dbf, dWi, dbi, dWcc, dbcc, dWo, dbo, dWy, dby],
            [mWf, mbf, mWi, mbi, mWcc, mbcc, mWo, mbo, mWy, mby]):
        mem += dparam * dparam
        param += -learning_rate * dparam / np.sqrt(mem + 1e-8)

    p += seq_length
    n += 1

GPT-2 the Next Level in Text Generation for now at least

April 26, 2019Machine Learning, Text GenerationGPT-2, OpenAI, Siraj Ravel, Welcome to WordPress!erickclasen

If you love WordPress you are really going to like this post 🙂

I have played around with OpenAI’s GPT-2 a bit and it is a fun text generator. It does a strikingly good job at generating fairly OK text. With a little touch up it one could create something sensible and readable.

Seeded with “Welcome to WordPress!” (I’m editing in WordPress and this was the first thing that came to mind) below is what OpenAI’s GPT-2 machine learning text generator came up with on the first shot. This has not been edited and GPT-2 seems to spell all words 100% and has fair grammar and sentence structure. Even though it can make fair paragraphs, it can wander a bit off topic at times but, is generally coherent. I have to admit being a bit blown away when I first started playing with it a bit and this is only using the 117M model version available on Github.

So, here’s the part where we start…

Welcome to WordPress!

Model prompt >>> Welcome to WordPress!
======================================== SAMPLE 1 ========================================
There are over 10,000 websites. It’s hard to count them all! And, it really is.

Let’s start with a quick summary of a few websites – WordPress, WordPress Plus, Hosted Blogger, WordPress, blog, etc.:

This is a list that lets you go from list to list and even from list to list. When you click on my name you can see I have 5 pages with 5,000+ blog posts.

It’s very easy to find and read a list on Twitter.

When you visit the Website, you are being asked to navigate through several pages of websites. You can also go to our Website navigation.

Now, let’s go for a little more fun: I hope that you have enjoyed this website and find the posts of other visitors that are looking to help you. It’s been one of those times, like now since the late ’80s, when I was doing it, that people would tell me I was dumb, that I didn’t know how to read a book. “I’d never read a book!” And that would be the beginning.

Now, I know when I read a book, I don’t look like a dumb person. No.

To be honest, I’m used to looking pretty stupid for being dumb. I don’t always have friends in their twenties or thirties that I would have found interesting. And I don’t take myself very seriously when things are looking good.

And I always have friends just as I am, and I don’t do good writing, and, in fact, when I’m not writing, I have friends I think are actually intelligent, but they are totally out of their heads. They are not stupid, so there’s just something about me that I would like to be called an intelligent genius. And they are just out of their brains.

I always go through my twenties and thirties thinking I’m smarter, and I’m going to give them crap and they’re probably right. I go through my twenties and thirties thinking I’m smarter and they’re just not going to.

I see all the people I’m reading these days thinking I’m smart and smart in my twenties and thirties, and I think, “My God? It’s only been a month since I have read a book.” I just think I’m getting the wrong look at it.

For Variety Let’s Go Try Again

This time around it output’s what looks like the intro to a tutorial on building a site, with some clunky HTML at the bottom. Notice the emoticon! I stole a few lines from this output and already wove them in above for fun. If one was to take the text below and hand edit it, it could be used to kick start a tutorial. It could break any writers block that you might have and get the ball rolling, see the example right after this text dump.

If you love WordPress you are really going to like this post 🙂

How Do I get started making WordPress?

In case you didn’t know WordPress is actually not quite that interesting after all. Even if you’re not trying to go the Web with a web browser.

But in order to succeed, WordPress needs you to be familiar with the various parts of it and if that’s all, you’re going to need some background info!

We’re actually all interested in WordPress on a lot of levels.

So, here’s the part where we start…

Creating a website

For WordPress, you’re going to be using the web for a couple of things.

The first thing you’ll want to do…

Create websites

So, you’ve probably read how the web can be an intimidating tool for beginners.

And so, we want to take you through what it really is.

Our site is a very basic web page.

So, how do we define a site and what do we do with it?

Well, it starts with making an appearance.

It’s called an email address

A website addresses are very powerful because it tells us the number of page views and the total number of views for an email.

Let’s start by getting you to get a feeling of how well this page works.

So what’s the deal with a website?

Let’s take a little detour

Now that you’ve got that impression about the website and get to your first point, we’ll start to explain more about how it functions.

How a website works?

Let’s first get you started with a simple HTML page .

We’ll start with defining our website at a basic level. Let’s go ahead, for the sake of simplicity we’ll take all of the components from the HTML below:

</div>

“example”

</script>

“http://the.example.com/?page=example.svg”

</a>

<page>

</page>

</body>

OK lets make an intro to a WordPress tutorial out of some of the text above to see how it can help prime the pump of writing….

If you love WordPress you are really going to like this post 🙂

You might be wondering, how do I get started making a WordPress site? You’ve probably read how WordPress can be an intimidating tool for beginners. But, in order to succeed, you need to be familiar with the various parts of WordPress. You’re going to need some background info!

So what’s the deal with a website? Let’s start by getting a feeling of how building a page works by looking at how webpages work in the first place. We will take a little detour and get you started with a simple HTML page.

…it is entirely possible to get recursive and keep feeding the model text as you work along in the tutorial to create more ideas to propel the creative process forward. I might go into this further in the future and produce an example. But for I post a few cuts from model produced text below to show off the concept.

How do I get started with WordPress?

Once you have your free, easy to use WordPress site up and running for your WordPress account, you can find some tips and tricks to get started with the site you want to build your own plugin and set up in your WordPress application.

..I liked the line “How do I get started with WordPress?” so I fed that into the model and next thing you know I get something that could be a start to the next section of a tutorial…

Model prompt >>> How do I get started with WordPress?
======================================== SAMPLE 1 ========================================

Step 1 – Create your site

Once you have your site, get a copy of the plugin. I use WP Engine as my plugin manager. Once you are installed, you should see a page with WordPress: Setup – Configuration for your content page. After you have done that, click Create a new page for your newly created page. You should see the page with WordPress: Setup – Your site. Click Next to see the configuration section. In the next step, you will create the page as if your site was created with WordPress.

Step 2 – Update your site

Update your site in the WordPress dashboard. If it is installed from the local system of your WordPress installation, your WordPress will be updated. You can click Next to see updates for your site as they were made.

Step 3 – Set up your site

Set up your site by creating a WordPress account, signing in to the WordPress account, setting up your website with WordPress, etc.

Step 4 – Creating your website

Go to the Settings page and click Create Website. You can now create a new page under your WordPress profile. Select your new URL. In this section, you can check if this should work. If not, then you will need to click on the Advanced Button to make an update. Once you have checked if your site has been updated, click Continue. To finish, select Update Site and make a new change to your WordPress site.

Step 5 – Register

Once you have registered with WordPress, sign up for the My Online Account Program, which allows WordPress websites to be easily accessed from any web browser. Register your account with your web hosting provider (AWS) and you should see the instructions below.

To begin, click on the Registration Check button. If it is still not there, then you will need to click on the “New Username” option or click on the Search bar that appears.

Click on the Search button in one of your applications. It will take you to the section on registration.

Select your domain name. Click OK to confirm that you are registered, it will pop in.

In both the Name and Contact fields below, click Add domain to Register and click the Registration Button, check Yes or OK to confirm that you should see a confirmation page.

Once you have checked in on your domain name, the web browser will ask you to enter your password. Next, follow these steps to change your existing

… almost the end for now.

Finally, How about a 3rd Try

Now the model goes way off topic. My guess is that there is random seeded of the Weights and Biases in the model and they must have been randomized in a way that caused the model to veer off topic right from the start.

<|endoftext|>A few weeks ago, I published an article detailing the first round of the 2016 American College Football Championship. The article was titled “Top 10 Players in Division I College Football,” and it’s still a long walk from having the best-of-the-class selection process of any college in college football. It’s an interesting concept, as it’s a way into the future. However, a lot of people may still think the same thing I’m saying, and I should think twice.

After seeing the rankings at my own site, I decided to share a few details on the 2017 football draft. The Top 10 prospects in Division I, along with a few other players, are included so that this post will highlight what makes the best players in college football.

The top 11 prospects in 2017

I’m making it sound as if our top class could be considered the top 5 in college football now. That’s not really so. I’m sure, we’ll be seeing much of the top 10 players in this Class of 2017 in 2016. The Top 10 players in Division I were ranked by the following: The number of career touchdowns and receiving yards the top 10 plays a season. If you are a member of the College Football Playoff, and you’re a member of Football Bowl Subdivision, I am going to rank you by the following. You are no slouch, but that is, if you rank first in the Big Ten or NFC Division, you might not have the Big Ten/CSN, unless that person, Jim Irsay, is looking at you.

If you are a member of the NCAA, and you’re going to be ranked second or third nationally in each conference, you might not have the highest ranking player in one conference, but maybe a couple of the higher rankings are worth your time and attention. These rankings are based off of two different ways of looking at players going into their careers as Division I college football prospects:

The Football Bowl Subdivision

My current ranking of teams is based on just two different ways:

• Based on how many points the top team is allowed to gain in division I.

• Based on how many points a team is allowed to lose in division I

This is also really well thought out and makes me wonder what the best option (or worse) for the college football players I know in the future may be.

I’m going to list my team’s results in alphabetical order, and I’d like to get

Resources

https://www.lmspulse.com/2019/open-source-artificial-neural-network-gpt/

https://lambdalabs.com/blog/run-openais-new-gpt-2-text-generator-code-with-your-gpu/

Siraj Ravel does a good job explain the technology behind OpenAi’s GPT-2 Text Generator

AlphaGo: Machine Learning and the game Go

March 29, 2019Machine LearningAlphaGo, AlphaGoZero, Go, LeelaZeroerickclasen

This post is basically a list of good resources on AlphaGo and the game Go. There are many fine tutorials out there on the Internet that I have read through to understand more about machine learning and about how AlphaGo functions. I have collected what are in my opinion some of the best out there and published them in this post.

Additionally there is (as of June 2018) an open source version of AlphaGoZero (The zero means it started from zero, as in there was no priming with data from human played games, it is programmed with the rules to Go and just plays against itself repeatedly) called LeelaZero which was built by following the paper published by DeepMind that covers the research and development of AlphaGoZero. It is a formidable player indeed, I can’t even get one point against the monster. As an experiment, I played it against GnuGo to see how GnuGo would fair. It still gets beat by LeelaZero at a slower rate than I do but, is able to score some points against LeelaZero on occasion.

Alpha Go how and why it works

The post by Tim Wheeler is hands down one of the clearest explanations I have seen. Tim Wheeler not only does a great job with this post, he has many other quality posts on his site http://tim.hibal.org

tim.hibal.org/blog/alpha-zero-how-and-why-it-works/

While you are looking at Tim’s post consider viewing the Alpha Go Cheatsheet as well, keep them both open and flip between them, a great way to learn.

hi res AlphaGo Cheatsheet

https://applied-data.science/static/main/res/alpha_go_zero_cheat_sheet.png

Other resources

One Diagram AlphaGoZero

https://medium.com/applied-data-science/alphago-zero-explained-in-one-diagram-365f5abf67e0

The Wikipedia article on Monte Carlo Tree Search is worth a skim if you are not already familiar with Monte Carlo Tree Search which is used in game playing code, both machine learning driven game algorithms and what I would call pre-machine learning types. Previous to machine learning it was successful mostly for games that have a lower branching factor such as Chess. It is also used in GnuGo in a mode that plays on a smaller than standard board (9×9 and smaller). It is probably limited to a small board by the branching factor which gets huge as the board size increases. The number of board configurations is 3^n^2, n being intersections. A 9×9 board has 10^38 and a 19×19 10^170 legal positions according to a Wikipedia article that I read.

Background

For a good background and a brief history of machine learning, deep reinforcement learning in particular. Well worth the read…

Andrej Karpathy Deep Reinforcement Learning: Pong from Pixels

Hands on Exercise

There is an article on Medium that is worth a read, Teach a machine to learn Connect4 strategy through self-play and deep learning
Plus it lets you follow along and build the code to get a nice hands on experience.

LeelaZero: Basically an open source Alpha Go Zero and uses a JAVA GUI (Lizzie) to play it

LeelaZero, the Go engine is an easy to download and compile program, at least on Linux where I had it up and running in about 5 minutes. It uses a companion interface Lizzie written in Java for it’s GUI. LeelaZero is interesting and fun to try out. The pondering mode is cool. You hit the space bar and it ponders and shows the next moves probabilities of winning and depths of search. You can see from the 1.7%, it is beating me pretty bad after 70 moves as it is almost certain to win.

Hovering the mouse pointer over a specific move shows projections of the next moves for both players labeled with numbers up to it’s maximum forward game play estimates.

GnuGo in EMACS

GnuGo Can be played within Emacs, which is handy. This is what I did when I played GNUGo against LeelaZero. I forced LeelaZero to play black so it went first and then mirrored it’s move into the GnuGo board and GnuGo’s move back to the Lizzle/LeelaZero board. When I looked at the projects code it looks like development stopped around 2009. At the time it was a fairly good computer Go game but since then others have outpaced it’s strength. From what I recall it is in the 900 elo range for strength.

If you have Emacs installed the GUI version and GnuGo installed, then with Emacs open pressing Alt-X and entering gnugo in the Emacs command buffer will open GnuGo within Emacs. The benefits of this is that you can use the mouse or up and down arrows to play instead of entering coordinates at the command line.

www.gnu.org/software/gnugo/gnugo_3.html#SEC27

General Go Resources

https://en.wikipedia.org/wiki/Book:Go:_The_Board_Game

https://en.m.wikipedia.org/wiki/Computer_Go

https://en.m.wikipedia.org/wiki/Rules_of_Go

https://en.m.wikipedia.org/wiki/John_Horton_Conway

May 2024
M	T	W	T	F	S	S
« Apr
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

ChatGPT can help to understand parts of a research paper that require clarification.

The process involved to simplify text

Transcript of the ‘conversation’ with ChatGPT

Conda Environment Setup

Python Virtual environment example , setup and activate…

To setup dependency list

Memory

Code

Example of styling an image

Input Images

Memory Usage and Performance

Iter Directory and Animated GIFs

Command Line Output

Final Result

Animated GIF

Resources

Text Generation

char-rnn code

Comparing the Versions

Modifications of min-char-lstm.py

char-rnn – Training Loss and Validation Loss

Lua Torch

Lua Torch torch-rnn 2 layer 1024 dimensions

min-char-lstm-mod-2.py

Welcome to WordPress!

For Variety Let’s Go Try Again

Finally, How about a 3rd Try

Resources

Siraj Ravel does a good job explain the technology behind OpenAi’s GPT-2 Text Generator

Alpha Go how and why it works

hi res AlphaGo Cheatsheet

Other resources

One Diagram AlphaGoZero

Background

Hands on Exercise

LeelaZero: Basically an open source Alpha Go Zero and uses a JAVA GUI (Lizzie) to play it

Released Code for LeelaZero

Main LeelaZero Page

An interesting discussion on LeelaZero

GnuGo in EMACS

General Go Resources