Category Archives: Text Generation

Makemore

November 2, 2023Machine Learning, Text GenerationAndrej Karpathy, art generation, artificial intelligence, Coding, Data Science, deep learning, Github, GitHub Repositories, GPT2, Grid Search, Language Models, Machine Learning, Model Optimization, NLP (Natural Language Processing), Python, Research, Tagging System, Text Generation, Transfer Learning, Transformerserickclasen

The best way to predict a future is to look for it. We live in the moment where the world is a vast machine for predicting the future.

This summer, I spent a significant amount of time contemplating large language models and delving deeper into their research. My first encounter with GPT-2 was back in 2019, where I explored its code and experimented with it. During this period, I became curious about transfer learning and its applications. Additionally, I had some prior knowledge about transformers, but it wasn’t as comprehensive as my understanding of LSTMs and RNNs. I couldn’t confidently explain what they did, for example.

While researching transfer learning with smaller models like GPT-2, I stumbled upon Gwern Branwen’s website (https://gwern.net/) and, in particular, his TWDNE Project (https://gwern.net/twdne). I found it clever because it combined a generative model for both images and text. I decided to focus on the text side of the project, as the image aspect was already well-addressed by applications like Stable Diffusion….

I might revisit the image style transfer aspect in the future, as I had previously explored it to some extent. You can find more about this in my “How to Generate Art Demo Followup.”

Before this, I had predominantly explored machine learning with code from the ground up using Python (PMLC). I have used ML practically in the form of genetic algorithms for tuning parameters on investing models for years, non-differentiable, so no chain rule! An offshoot was a project called gen-gen-algo, a generic genetic algorithm. Now, finally after all these side quests, I was ready to tackle something more complex and cutting-edge using GPT.

I found excellent resources on GitHub and in video format from Andrej Karpathy (https://github.com/karpathy). The following repositories were particularly helpful in my learning journey. The first one, “nn-zero-to-hero,” features a series of videos that provided a solid foundation in understanding transformers.

The second repository, “makemore,” served as my warm-up exercise to get back into working with transformers and Large Language Models (LLMs) after a period of dormancy in the field. You can access these repositories here:

1. “nn-zero-to-hero”: https://github.com/karpathy/nn-zero-to-hero
2. “makemore”: https://github.com/karpathy/makemore

Fork of makemore

My experience with “makemore” went beyond the basic examples provided in the original repository, which generated new names based on a dataset of names. Initially, my goal was to apply “makemore” to various datasets other than “names.txt.” I experimented with larger and smaller datasets, including those with extensive collections of English words, numbers for addition, square roots, and a substantial dataset of quotes containing nearly 10 million entries, some of which had lines as long as 505 characters. By using scripts and modifications to “makemore.py,” I conducted a grid search to optimize hyperparameters, including constraints on model size. Output from “makemore.py” was saved to a CSV file, along with hexadecimal hash values for easy tracking and analysis during the tuning process.

To further enhance the code, I introduced a grid search optimization method using a Bash script. This allowed for exploring the hyperparameter space while maintaining a ceiling on the model size. Without such constraints, optimization typically led to increasingly larger models that resulted in the lowest loss.

I also introduced the concept of assigning a random hexadecimal tag to the output of “makemore.py.” This tagging system facilitated the easy identification of the best loss and the associated set of hyperparameters that produced it. Additionally, I incorporated an early stopping mechanism into the “makemore.py” code.

If you’re interested in exploring my fork of Andrej Karpathy’s “makemore” code, you can find it here:

https://github.com/erickclasen/makemore

For a more detailed understanding, I’ve created a comprehensive “verbose-readme.pdf” that provides additional information:

Version on this site, opens in browser:

verbose-readme

GitHub Version requires downloading:

https://github.com/erickclasen/makemore/blob/master/verbose-readme.pdf

RNN Text Generation Using Tensorflow

March 3, 2023Machine Learning, Text Generationgenerate from posts, Machine Learning, RNN, Tensorflow, Text Generationerickclasen

Imagination is the power to make a difference in yourself.

After trying a few RNNa and LSTMs for text generation that rely on numpy alone it is interesting to see the performance of Tensorflow based code that is closer to the cutting edge of what is possible to do with machine learning.

I found a good and easy to use set of code in the following Github archive…

https://github.com/spiglerg/RNN_Text_Generation_Tensorflow

The requirements are simple…

numpy==1.13.3
tensorflow==1.4.0

I was running it on Conda Python 3.6 environment but, this is not a requirement. The code uses a saved folder where it can save training checkpoints, so it is possible to interrupt and resume training and also use it in a generate or “talk” mode after the model has been trained. The caveat that I learned quickly when training on a few types of files is when training each type of file that is trained into requires it’s own set of checkpoints, which is pretty obvious. So it is best to either wipe out the saved dir contents after a run on a specific corpus. OR, better yet make a subdir for the training checkpoints.

Training is basically sending it the following command…

python rnn_tf.py --input_file=data/us-constitution.txt --ckpt_file="saved/model.ckpt" --mode=train

Once trained it can be fed another command…

python rnn_tf.py --input_file=us-const-lstm/us-constitution.txt --ckpt_file="saved/model.ckpt" --test_prefix="The " --mode=talk

or if the checkpoint files have been moved to their own directory then you can use something like this…

python rnn_tf.py --input_file=us-const-lstm/us-constitution.txt --ckpt_file="saved/us-const-trained/model.ckpt" --test_prefix="The " --mode=talk

In the structure for the commands the location of the file is listed and the location of the checkpoint file as well. The generate mode allows priming with a word or phrase such as “The”.

The US Constitution is not a big corpus and I am sure this code like others would benefit from training against a larger corpus. My intent in the future for an experiment is to train it against a file containing all the posts on this site to see what it can do on that corpus.

-rw-r–r– 1 erick erick 1115394 Apr 26 12:01 shakespeare.txt
-rw-r–r– 1 erick erick 45120 Apr 26 12:59 us-constitution.txt
-rw-r–r– 1 erick erick 374605 Apr 26 13:52 my-posts.txt

When trained on the US Constitution it does very well at producing coherent text. Besides the lack of capitalization it seems to be actually to the point of memorizing parts of the text. This might be because it is a small corpus and it is overfitting.

The Senators and Representatives before mentioned, and the Members of the
several State Legislatures, and all executive and judicial Officers, both of
the United States and of the several States, shall be bound by Oath or
Affirmation, to support this Constitution; but no religious Test shall ever be
required as a Qualification to any Office or public Trust under the United
States.

Article 7.

The Ratification of the Conventions of nine States, shall be sufficient for the
Establishment of this Constitution between the States so ratifying the Same.

Sentence:
the several states, shall be bound by oath or
affirmation, to support this constitution; but no religious test shall ever be
required as a qualification to any office or public trust under the united
states.

article 7.

the ratification of the conventions of nine states, shall be sufficient for the
establishment of this constitution between the states so ratifying the same.

done in convention by the unanimous consent of the states present the
seventeenth day of september in the year of our lord on

the Case of a Bill.

Section 8
The Congress shall have Power To lay and collect Taxes, Duties, Imposts and
Excises, to pay the Debts and provide for the common Defence and general
Welfare of the United States; but all Duties, Imposts and Excises shall be
uniform throughout the United States;

To borrow money on the credit of the United States;

To regulate Commerce with foreign Nations, and among the several States, and
with the Indian Tribes;

To establish an uniform Rule of Naturalization, and un

Sentence:
the case of a bill.

section 8
the congress shall have power to lay and collect taxes, duties, imposts and
excises, to pay the debts and provide for the common defence and general
welfare of the united states; but all duties, imposts and excises shall be
uniform throughout the united states;

to borrow money on the credit of the united states;

to regulate commerce with foreign nations, and among the several states, and
with the indian tribes;

to establish an uniform rule of naturalization, and un

Training

Training against the corpus of blog posts on this site produced output like this and took about 4 hours of compute time.

batch: 0 loss: 4.492201328277588 speed: 121.8853488969507 batches / s
batch: 100 loss: 3.214789628982544 speed: 1.3747759497226923 batches / s
batch: 200 loss: 3.0983948707580566 speed: 1.4065962415903654 batches / s
batch: 300 loss: 2.8669371604919434 speed: 1.4141226357348917 batches / s
batch: 400 loss: 2.359729051589966 speed: 1.416853411853437 batches / s
batch: 500 loss: 2.0080957412719727 speed: 1.4160802277642834 batches / s

…

batch: 19500 loss: 0.22069120407104492 speed: 1.4188681716674931 batches / s
batch: 19600 loss: 0.21757778525352478 speed: 1.4218841226396346 batches / s
batch: 19700 loss: 0.2309599369764328 speed: 1.362554971973392 batches / s
batch: 19800 loss: 0.23969298601150513 speed: 1.3983937654375616 batches / s
batch: 19900 loss: 0.23989509046077728 speed: 1.3854887855619515 batches / s

The following is some samples of the output it generates. It definitely could use more training to help it. The fact that the posts contain some code, numbers and jargon probably doesn’t help either.

Sentence:
the installed.
display install wiflinut for ray run process queue every monday, wednesday and friday ran
   1000000 ractine resitely and configure a firewall to only allow certain ip numbers a
   connection to show that the board is
   powered. there are a concatenated version of the log.txt
cacking out of the full -ho 1 than i could have it may be set the
   command which just restart the “how ther have up suncals
   regulator, frequency valies.
   more data. i sho, vift…
sudo selond

below is

Sentence:
the whole hmad can noid through the server and logged
in a while later and the shutdown script had recorded failed pings into
systemctl.

i was not ne rewent when it shuts down.

for a help afout shourd entire (but mean most looking a series of
for clean ubuntu server install will prompt for a username and password to access folders as
well, especially if the users and password is needed autosuspend should oright. it level, no 62 defanly 34-fermentation crontab, still radio
shar

trying out min-char-rnn and lstm

July 4, 2019Machine Learning, Text Generationchar-rnn, lstm, min-char-rnn, torch-rnnerickclasen

Text Generation

In early 2018, I started researching machine learning. I was curious about it and also looking for anything that could be useful in the space of machine learning to perform functions in code, specifically trading algorithms. I looked for code that would be easy to get started with, easy to pick apart and understand. I knew it would take time to understand and I was fine taking some sidetracks down some territory that would be interesting to play with.

I quickly came across various versions of text generation code. I won’t get into the theory here as there is a ton of information on it already and I have included my sources as links in the post.

Basically, this post will focus on the use of a well documented version char-rnn specifically min-char-rnn and an improved version that uses lstm.

This post is a brief dump on my toying around at text generation with machine learning. I may cover it further in more detail in the future.

To see some cutting edge examples of text generation, that is pretrained on a corpus using GPT-2, see the following post…

GPT-2 the Next Level in Text Generation for now at least

Currently I am trying out RNN Text Generation using Tensorflow and plan on posting some results in the future…

https://github.com/spiglerg/RNN_Text_Generation_Tensorflow

char-rnn code

I started out with Andrej Karpathy’s min-char-rnn initially when I was in machine learning research mode early in 2018. During the early fall of 2018 I found posts by Eli Bendersky that gave a good break down of min-char-rnn, including code that had more comments and a few other pieces of code. The two other pieces of code was a Markov Model text generator and lstm extension to min-char-rnn. In theory the lstm version of min-char-rnn should perform the best. I wound up with the best results (Python only code) using this version with modifications. The three modifications were the ability to save the output text while it was running and the ability to trim down the learning rate progressively while it was running and to specify a file name at the command line. I did this after I noticed that the loss was oscillating and not decreasing under certain conditions, playing with the coefficients for the code. Such as layers and the amount of lookback of the characters. Trying to decrease the learning rate can help with this behavior.

Beyond this code a more sophisticated way to do text generation is to go to using Lua Torch and running torch-rnn. This requires downloading and installing all of the packages in Linux required to run it.

Comparing the Versions

To start, min-char-rnn performance can be compared against the lstm version. In this example I am using the US Constitution as it is a document that is widely available and many people are familiar with. In the future I will cover running the text from this site through text generation code.

After 99000 loops of min-char-rnn on the US Constitution.

iter 99900, loss: 34.083777
 ----
  , Reach witle from to
 the Lice of shall degrest and unccive

athins or propinds, ploovate one shall eptemitlatiall un ligre shall have hake by the Ugiters shall no no be
 as
 writh two 2hs -quals and of

You can see it is trying to pick up on something and a few of the words are actually legitimate English.

Now the lstm version created by Eli Bendersky, minimal-character-based-lstm-implementation…

iter 99800 (p=18160), loss 9.844710
 ----
 shall receire ffour Houser, hred in overyof morty Cowcurthir
 such onf
 grate,
 shall then callary, and
 sittin dutler, shall, with an electors
 and Elections,
 which shall be a President.

3. Nuties, Impos

This version shows some hope as it is forming some more structure, better words and parts of sentences. The loss is lower as well.

What if the LTSM version runs longer, like a million cycles?

iter 999000 (p=17184), loss 4.694482
 iter 999200 (p=20384), loss 4.734232
 iter 999400 (p=23584), loss 4.815483
 iter 999600 (p=26784), loss 4.979268
 iter 999800 (p=29984), loss 5.165326
 ----
 shall consisted; but the Congress may by Congress, becommit,
 shall be a Senator or Representative intee any Department or Trust under the Laws Spojgiled to consirques hating been creary shall prioriti
 ----

It is getting a bit better, less broken words, formation of a sort of paragraph.

How about ten million cycles, where the loss seems to bottom out in the 2.5 range…

iter 10096200 (p=21776), loss 2.487097

iter 10096400 (p=24976), loss 2.517261

iter 10096600 (p=28176), loss 2.605424

iter 10096800 (p=31376), loss 2.556021

----

against the sements whereor who shall return in Consent whations whict: Amend

Mander State. a Treason of Disubility to lis arming the cume aftered thanney. Ir, or Conventions as the lise dusceptraray

I concatenated some more of the output….

to

post a Member in their borth intomie States, the Vice President and denuinned.

Amendment 10

The powers not the twerfth not betilizent of this article the Vicembagion and such Pentitias of the seve

s the United

Stated; under their Party; without the

United States.

Article 36. When mademe Court of the United States which shall not be retsion age State. Andain duty a stanly such Majority, or the

ited States or by any

State.

No Prefered, the

President proviit for President; necestald a majority be a Members, and the Legitlationen for the law of the President, to be the Eduld a Memberd

to ever

ne of the seber to

the Approparal Becomes of Blesident within the United States un nunis primas to which the District hensbill, such Presented by incohringle shall be

tax having

States, and

transmit t

Modifications of min-char-lstm.py

The following was created by modifying the code at https://github.com/eliben/deep-learning-samples/blob/master/min-char-rnn/min-char-lstm.py

The modifications are, 1 to be able to allow for a learning rate that starts higher and declines. I was experimenting with using the min-char-ltsm.py code on the contents of the blog posts on this site and I noticed that the learning rate would decline to a point and then oscillate up and down. By starting with a higher learning rate and then trimming it lower and lower, I was hoping to get the oscillations to settle and achieve a better loss rate. The other modification is that it will not print text to the screen until the loss declines by half. It also saves the text that is generated after the loss drops to a quarter of the original. Saving the text allows for optional post analysis, for instance of keywords.

Code for min-char-lstm-mod-2.py is pasted at the bottom of this post.

python min-char-lstm-mod-2.py us-constitution.txt
2019-02-21 21:58:36.456591: iter 13238027 (p=48), loss 5.571070
----
  f Hear Porty, or Vice President equal, as
Vares possary, having to nimpost to the President, dofe, for the Senate, the first Meeting of the first Summarma
onle Jonn admius lesments shall exercise they shall not be consthuled for tainamanment mort himal of the President shall consist of a sever Years as of the
United States;

To recass on the Congress may be admints, Cans; the
proprictions herein becom tamy and
Partarittent; or transting Surdsation shall
immentent no State, or abilitives shall b 
----

Saving to file

2019-02-21 21:58:36.568285: iter 13238028 (p=64), loss 5.567595
----
 anstwry by such Vacarciuse
Amdiavion, or other Sestected, and the Congress shall may leibtry days by Contict Biltian convice Vith.

No case, dupa such Penisdatizens prsed the Bildsent of thein be cindent for sitt in Cases of President and Vice President, or altice any Office of either House shall be held in the Bilas, except prohicies and Consuls; to be senict
compected
in this Conments of Congress.

The executive atther during the right of the United States, shall choole the Office of Rewofity  
----

Saving to file

char-rnn – Training Loss and Validation Loss

char-rnn – Training Loss and Validation Loss from MachineLearning

Lua Torch

I also experimented with torch-rnn which use Lua Torch. It works OK but nothing beyond what I see with the lstm version above. I only tried it breifly and haven’t formed any solid conclustions.

Lua Torch torch-rnn 2 layer 1024 dimensions

https://github.com/jcjohnson/torch-rnn

st keming the Immedince shall
have Power of Not
shall be not
lations for each
shall any
State by the Judigaany state not provided by Casssimate and Repund and Jurtice of the Sequicled in the Unanimed as excleding recrisal of its Consent of the
Mevole shall then the Vice-President, and of the President shall be make Raycesorveny or the right noveranded.

E thas Deleste, by Odfect to the nomes of any Qerfon and ciry of the State

becredo nugr on Condeling and firmine who
haviny this Constitution, but no derso- hivin, Regulation of Vice-Une Tneasor
this BitFzinst entseived.
he fect Coof Presidences, nhish shall be agSent of the Treaso shall gave to behave to the States nor and devermay of the United States; Monor subrected, nor and during the Leecther Year
d aftee the Adjupreit, but in a Memualif or public Ministersss atcerrind ad any Piffost the States connicted to Them thind
ponted by the United States.

S. Pemsud for the
chosen thes shall be a particied and Hays. Wh labth the narge of the Senate, law, ablone Indianty for a dwoun the Eves Motors of liozen Deardors and Elestions and ass ow the Legislatures
shall nake at semoun shall not be require the sunes as ivaly, age unters; and necons al
witn from Oate Members, and accuration of time of titimes, inlarconcancrading one properdyy of the United States, or in which meat
male, in property, sian to the Person having anm notine

Immortizer of having the President.

th onothert commors and Consent, shall apr in this Conviction
may shall, Ligizen aplice—
B.

Smation C
qulication, the first Manger To
theresimant of a pripersonr
Thithit subject dot chimles Tnemeriting the several States, shall be shall be equal States, or in any other Election,
compensation, without the several States; the receita diforme, but nother shall detmanation shall not excerain their Vecessary sexzect juty, or puflis indey
they shall be neach or number in mate been courtion or execuin s co-venty shall not be consugheraty of the Scatem at shall h

min-char-lstm-mod-2.py

# Minimal character-based language model learning with an LSTM architecture.
#
# Overall code structure based on Andrej Karpathy's min-char-rnn model:
#    https://gist.github.com/karpathy/d4dee566867f8291f086
#
# But the architecture is modified to be LSTM rather than vanilla RNN.
# The companion blog post is:
#   https://eli.thegreenplace.net/2018/minimal-character-based-lstm-implementation/
#
# Tested with Python 3.6
#
# Eli Bendersky [http://eli.thegreenplace.net]
# BSD License per original (@karpathy)
from __future__ import print_function

import numpy as np
import sys
import datetime


# Make it possible to provide input file as a command-line argument; input.txt
# is still the default.
if len(sys.argv) > 1:
    filename = sys.argv[1]
else:
    filename = 'input.txt'

with open(filename, 'r') as f:
    data = f.read()

# All unique characters / entities in the data set.
chars = list(set(data))
data_size = len(data)
V = vocab_size = len(chars)
print('data has %d characters, %d unique.' % (data_size, vocab_size))

# Each character in the vocabulary gets a unique integer index assigned, in the
# half-open interval [0:N). These indices are useful to create one-hot encoded
# vectors that represent characters in numerical computations.
char_to_ix = {ch:i for i, ch in enumerate(chars)}
ix_to_char = {i:ch for i, ch in enumerate(chars)}
print('char_to_ix', char_to_ix)
print('ix_to_char', ix_to_char)

# Hyperparameters.

# Size of hidden state vectors; applies to h and c.
H = hidden_size = 100
seq_length = 16 # number of steps to unroll the LSTM for
learning_rate = 0.1

# The input x is concatenated with state h, and the joined vector is used to
# feed into most blocks within the LSTM cell. The combined height of the column
# vector is HV.
HV = H + V

# Stop when processed this much data
MAX_DATA = 1000000

# Model parameters/weights -- these are shared among all steps. Weights
# initialized randomly; biases initialized to 0.
# Inputs are characters one-hot encoded in a vocab-sized vector.
# Dimensions: H = hidden_size, V = vocab_size, HV = hidden_size + vocab_size
Wf = np.random.randn(H, HV) * 0.01
bf = np.zeros((H, 1))
Wi = np.random.randn(H, HV) * 0.01
bi = np.zeros((H, 1))
Wcc = np.random.randn(H, HV) * 0.01
bcc = np.zeros((H, 1))
Wo = np.random.randn(H, HV) * 0.01
bo = np.zeros((H, 1))
Wy = np.random.randn(V, H) * 0.01
by = np.zeros((V, 1))


def sigmoid(z):
    """Computes sigmoid function.

    z: array of input values.

    Returns array of outputs, sigmoid(z).
    """
    # Note: this version of sigmoid tries to avoid overflows in the computation
    # of e^(-z), by using an alternative formulation when z is negative, to get
    # 0. e^z / (1+e^z) is equivalent to the definition of sigmoid, but we won't
    # get e^(-z) to overflow when z is very negative.
    # Since both the x and y arguments to np.where are evaluated by Python, we
    # may still get overflow warnings for large z elements; therefore we ignore
    # warnings during this computation.
    with np.errstate(over='ignore', invalid='ignore'):
        return np.where(z >= 0,
                        1 / (1 + np.exp(-z)),
                        np.exp(z) / (1 + np.exp(z)))


def lossFun(inputs, targets, hprev, cprev):
    """Runs forward and backward passes through the RNN.

      TODO: keep me updated!
      inputs, targets: Lists of integers. For some i, inputs[i] is the input
                       character (encoded as an index into the ix_to_char map)
                       and targets[i] is the corresponding next character in the
                       training data (similarly encoded).
      hprev: Hx1 array of initial hidden state
      cprev: Hx1 array of initial hidden state

      returns: loss, gradients on model parameters, and last hidden states
    """
    # Caches that keep values computed in the forward pass at each time step, to
    # be reused in the backward pass.
    xs, xhs, ys, ps, hs, cs, fgs, igs, ccs, ogs = (
            {}, {}, {}, {}, {}, {}, {}, {}, {}, {})

    # Initial incoming states.
    hs[-1] = np.copy(hprev)
    cs[-1] = np.copy(cprev)

    loss = 0
    # Forward pass
    for t in range(len(inputs)):
        # Input at time step t is xs[t]. Prepare a one-hot encoded vector of
        # shape (V, 1). inputs[t] is the index where the 1 goes.
        xs[t] = np.zeros((V, 1))
        xs[t][inputs[t]] = 1

        # hprev and xs[t] are column vector; stack them together into a "taller"
        # column vector - first the elements of x, then h.
        xhs[t] = np.vstack((xs[t], hs[t-1]))

        # Gates f, i and o.
        fgs[t] = sigmoid(np.dot(Wf, xhs[t]) + bf)
        igs[t] = sigmoid(np.dot(Wi, xhs[t]) + bi)
        ogs[t] = sigmoid(np.dot(Wo, xhs[t]) + bo)

        # Candidate cc.
        ccs[t] = np.tanh(np.dot(Wcc, xhs[t]) + bcc)

        # This step's h and c.
        cs[t] = fgs[t] * cs[t-1] + igs[t] * ccs[t]
        hs[t] = np.tanh(cs[t]) * ogs[t]

        # Softmax for output.
        ys[t] = np.dot(Wy, hs[t]) + by
        ps[t] = np.exp(ys[t]) / np.sum(np.exp(ys[t]))

        # Cross-entropy loss.
        loss += -np.log(ps[t][targets[t], 0])

    # Initialize gradients of all weights/biases to 0.
    dWf = np.zeros_like(Wf)
    dbf = np.zeros_like(bf)
    dWi = np.zeros_like(Wi)
    dbi = np.zeros_like(bi)
    dWcc = np.zeros_like(Wcc)
    dbcc = np.zeros_like(bcc)
    dWo = np.zeros_like(Wo)
    dbo = np.zeros_like(bo)
    dWy = np.zeros_like(Wy)
    dby = np.zeros_like(by)

    # Incoming gradients for h and c; for backwards loop step these represent
    # dh[t] and dc[t]; we do truncated BPTT, so assume they are 0 initially.
    dhnext = np.zeros_like(hs[0])
    dcnext = np.zeros_like(cs[0])

    # The backwards pass iterates over the input sequence backwards.
    for t in reversed(range(len(inputs))):
        # Backprop through the gradients of loss and softmax.
        dy = np.copy(ps[t])
        dy[targets[t]] -= 1

        # Compute gradients for the Wy and by parameters.
        dWy += np.dot(dy, hs[t].T)
        dby += dy

        # Backprop through the fully-connected layer (Wy, by) to h. Also add up
        # the incoming gradient for h from the next cell.
        dh = np.dot(Wy.T, dy) + dhnext

        # Backprop through multiplication with output gate; here "dtanh" means
        # the gradient at the output of tanh.
        dctanh = ogs[t] * dh
        # Backprop through the tanh function; since cs[t] branches in two
        # directions we add dcnext too.
        dc = dctanh * (1 - np.tanh(cs[t]) ** 2) + dcnext

        # Backprop through multiplication with the tanh; here "dhogs" means
        # the gradient at the output of the sigmoid of the output gate. Then
        # backprop through the sigmoid itself (ogs[t] is the sigmoid output).
        dhogs = dh * np.tanh(cs[t])
        dho = dhogs * ogs[t] * (1 - ogs[t])

        # Compute gradients for the output gate parameters.
        dWo += np.dot(dho, xhs[t].T)
        dbo += dho

        # Backprop dho to the xh input.
        dxh_from_o = np.dot(Wo.T, dho)

        # Backprop through the forget gate: sigmoid and elementwise mul.
        dhf = cs[t-1] * dc * fgs[t] * (1 - fgs[t])
        dWf += np.dot(dhf, xhs[t].T)
        dbf += dhf
        dxh_from_f = np.dot(Wf.T, dhf)

        # Backprop through the input gate: sigmoid and elementwise mul.
        dhi = ccs[t] * dc * igs[t] * (1 - igs[t])
        dWi += np.dot(dhi, xhs[t].T)
        dbi += dhi
        dxh_from_i = np.dot(Wi.T, dhi)

        dhcc = igs[t] * dc * (1 - ccs[t] ** 2)
        dWcc += np.dot(dhcc, xhs[t].T)
        dbcc += dhcc
        dxh_from_cc = np.dot(Wcc.T, dhcc)

        # Combine all contributions to dxh, and extract the gradient for the
        # h part to propagate backwards as dhnext.
        dxh = dxh_from_o + dxh_from_f + dxh_from_i + dxh_from_cc
        dhnext = dxh[V:, :]

        # dcnext from dc and the forget gate.
        dcnext = fgs[t] * dc

    # Gradient clipping to the range [-5, 5].
    for dparam in [dWf, dbf, dWi, dbi, dWcc, dbcc, dWo, dbo, dWy, dby]:
        np.clip(dparam, -5, 5, out=dparam)

    return (loss, dWf, dbf, dWi, dbi, dWcc, dbcc, dWo, dbo, dWy, dby,
            hs[len(inputs)-1], cs[len(inputs)-1])


def sample(h, c, seed_ix, n):
    """Sample a sequence of integers from the model.

    Runs the LSTM in forward mode for n steps; seed_ix is the seed letter for
    the first time step, h and c are the memory state. Returns a sequence of
    letters produced by the model (indices).
    """
    x = np.zeros((V, 1))
    x[seed_ix] = 1
    ixes = []

    for t in range(n):
        # Run the forward pass only.
        xh = np.vstack((x, h))
        fg = sigmoid(np.dot(Wf, xh) + bf)
        ig = sigmoid(np.dot(Wi, xh) + bi)
        og = sigmoid(np.dot(Wo, xh) + bo)
        cc = np.tanh(np.dot(Wcc, xh) + bcc)
        c = fg * c + ig * cc
        h = np.tanh(c) * og
        y = np.dot(Wy, h) + by
        p = np.exp(y) / np.sum(np.exp(y))

        # Sample from the distribution produced by softmax.
        #ix = np.random.choice(range(V), p=p.ravel())
    # IX HACK
        ix = p.argmax()
    x = np.zeros((V, 1))
        x[ix] = 1
        ixes.append(ix)
    return ixes


def gradCheck(inputs, targets, hprev, cprev):
    global Wf, Wi, bf, bi, Wcc, bcc, Wo, bo, Wy, by
    num_checks, delta = 10, 1e-5
    (_, dWf, dbf, dWi, dbi, dWcc, dbcc, dWo, dbo, dWy, dby,
     _, _) = lossFun(inputs, targets, hprev, cprev)
    for param, dparam, name in zip(
            [Wf, bf, Wi, bi, Wcc, bcc, Wo, bo, Wy, by],
            [dWf, dbf, dWi, dbi, dWcc, dbcc, dWo, dbo, dWy, dby],
            ['Wf', 'bf', 'Wi', 'bi', 'Wcc', 'bcc', 'Wo', 'bo', 'Wy', 'by']):
        assert dparam.shape == param.shape
        print(name)
        for i in range(num_checks):
            ri = np.random.randint(0, param.size)
            old_val = param.flat[ri]
            param.flat[ri] = old_val + delta
            numloss0 = lossFun(inputs, targets, hprev, cprev)[0]
            param.flat[ri] = old_val - delta
            numloss1 = lossFun(inputs, targets, hprev, cprev)[0]
            param.flat[ri] = old_val # reset
            grad_analytic = dparam.flat[ri]
            grad_numerical = (numloss0 - numloss1) / (2 * delta)
            if grad_numerical + grad_analytic == 0:
                rel_error = 0
            else:
                rel_error = (abs(grad_analytic - grad_numerical) /
                             abs(grad_numerical + grad_analytic))
            print('%s, %s => %e' % (grad_numerical, grad_analytic, rel_error))


def basicGradCheck():
    inputs = [char_to_ix[ch] for ch in data[:seq_length]]
    targets = [char_to_ix[ch] for ch in data[1:seq_length+1]]
    hprev = np.random.randn(H, 1)
    cprev = np.random.randn(H, 1)
    gradCheck(inputs, targets, hprev, cprev)

# Uncomment this to run gradient checking instead of training
#basicGradCheck()
#sys.exit()

# n is the iteration counter; p is the input sequence pointer, at the beginning
# of each step it points at the sequence in the input that will be used for
# training this iteration.
n, p = 0, 0

# Memory variables for Adagrad.
mWf = np.zeros_like(Wf)
mbf = np.zeros_like(bf)
mWi = np.zeros_like(Wi)
mbi = np.zeros_like(bi)
mWcc = np.zeros_like(Wcc)
mbcc = np.zeros_like(bcc)
mWo = np.zeros_like(Wo)
mbo = np.zeros_like(bo)
mWy = np.zeros_like(Wy)
mby = np.zeros_like(by)
smooth_loss = -np.log(1.0/V) * seq_length
best_loss = smooth_loss
# Save the initial loss so that printing and saving occur at 1/2 of it and 1/10 of it.
start_loss = smooth_loss
output_filename = "lstm-output.txt"

print("\nStart Loss:",start_loss)



while p < MAX_DATA:
    # Prepare inputs (we're sweeping from left to right in steps seq_length long)
    if p+seq_length+1 >= len(data) or n == 0:
        # Reset RNN memory
        hprev = np.zeros((H, 1))
        cprev = np.zeros((H, 1))
        p = 0 # go from start of data

    # In each step we unroll the RNN for seq_length cells, and present it with
    # seq_length inputs and seq_length target outputs to learn.
    inputs = [char_to_ix[ch] for ch in data[p:p+seq_length]]
    targets = [char_to_ix[ch] for ch in data[p+1:p+seq_length+1]]

    # Sample from the model now and then.
#    if n % 1000 == 0:
#        sample_ix = sample(hprev, cprev, inputs[0], 200)
#        txt = ''.join(ix_to_char[ix] for ix in sample_ix)
#        print('----\n %s \n----' % (txt,))

    # Forward seq_length characters through the RNN and fetch gradient.
    (loss, dWf, dbf, dWi, dbi, dWcc, dbcc, dWo, dbo, dWy, dby,
     hprev, cprev) = lossFun(inputs, targets, hprev, cprev)
    smooth_loss = smooth_loss * 0.999 + loss * 0.001
#    if n % 200 == 0:
#        print('iter %d (p=%d), loss %f' % (n, p, smooth_loss))

    # Sample from the model now and then.
    if smooth_loss > (start_loss/4):
        if n % 1000 == 0:
            print('%s: iter %d (p=%d), loss %f' % (datetime.datetime.now(),n, p, smooth_loss))
    elif smooth_loss < best_loss:
        print('%s: iter %d (p=%d), loss %f' % (datetime.datetime.now(),n, p, smooth_loss))
    best_loss = smooth_loss
        sample_ix = sample(hprev, cprev, inputs[0], 500)
        txt = ''.join(ix_to_char[ix] for ix in sample_ix)
        print('----\n %s \n----' % (txt,))
    
    if smooth_loss < (start_loss/6):
        print("\nSaving to file\n")
            with open(output_filename,'a') as file_object:
                file_object.write(txt)
                    file_object.write("\n")


    # Perform parameter update with Adagrad.
    for param, dparam, mem in zip(
            [Wf, bf, Wi, bi, Wcc, bcc, Wo, bo, Wy, by],
            [dWf, dbf, dWi, dbi, dWcc, dbcc, dWo, dbo, dWy, dby],
            [mWf, mbf, mWi, mbi, mWcc, mbcc, mWo, mbo, mWy, mby]):
        mem += dparam * dparam
        param += -learning_rate * dparam / np.sqrt(mem + 1e-8)

    p += seq_length
    n += 1

GPT-2 the Next Level in Text Generation for now at least

April 26, 2019Machine Learning, Text GenerationGPT-2, OpenAI, Siraj Ravel, Welcome to WordPress!erickclasen

If you love WordPress you are really going to like this post 🙂

I have played around with OpenAI’s GPT-2 a bit and it is a fun text generator. It does a strikingly good job at generating fairly OK text. With a little touch up it one could create something sensible and readable.

Seeded with “Welcome to WordPress!” (I’m editing in WordPress and this was the first thing that came to mind) below is what OpenAI’s GPT-2 machine learning text generator came up with on the first shot. This has not been edited and GPT-2 seems to spell all words 100% and has fair grammar and sentence structure. Even though it can make fair paragraphs, it can wander a bit off topic at times but, is generally coherent. I have to admit being a bit blown away when I first started playing with it a bit and this is only using the 117M model version available on Github.

So, here’s the part where we start…

Welcome to WordPress!

Model prompt >>> Welcome to WordPress!
======================================== SAMPLE 1 ========================================
There are over 10,000 websites. It’s hard to count them all! And, it really is.

Let’s start with a quick summary of a few websites – WordPress, WordPress Plus, Hosted Blogger, WordPress, blog, etc.:

This is a list that lets you go from list to list and even from list to list. When you click on my name you can see I have 5 pages with 5,000+ blog posts.

It’s very easy to find and read a list on Twitter.

When you visit the Website, you are being asked to navigate through several pages of websites. You can also go to our Website navigation.

Now, let’s go for a little more fun: I hope that you have enjoyed this website and find the posts of other visitors that are looking to help you. It’s been one of those times, like now since the late ’80s, when I was doing it, that people would tell me I was dumb, that I didn’t know how to read a book. “I’d never read a book!” And that would be the beginning.

Now, I know when I read a book, I don’t look like a dumb person. No.

To be honest, I’m used to looking pretty stupid for being dumb. I don’t always have friends in their twenties or thirties that I would have found interesting. And I don’t take myself very seriously when things are looking good.

And I always have friends just as I am, and I don’t do good writing, and, in fact, when I’m not writing, I have friends I think are actually intelligent, but they are totally out of their heads. They are not stupid, so there’s just something about me that I would like to be called an intelligent genius. And they are just out of their brains.

I always go through my twenties and thirties thinking I’m smarter, and I’m going to give them crap and they’re probably right. I go through my twenties and thirties thinking I’m smarter and they’re just not going to.

I see all the people I’m reading these days thinking I’m smart and smart in my twenties and thirties, and I think, “My God? It’s only been a month since I have read a book.” I just think I’m getting the wrong look at it.

For Variety Let’s Go Try Again

This time around it output’s what looks like the intro to a tutorial on building a site, with some clunky HTML at the bottom. Notice the emoticon! I stole a few lines from this output and already wove them in above for fun. If one was to take the text below and hand edit it, it could be used to kick start a tutorial. It could break any writers block that you might have and get the ball rolling, see the example right after this text dump.

If you love WordPress you are really going to like this post 🙂

How Do I get started making WordPress?

In case you didn’t know WordPress is actually not quite that interesting after all. Even if you’re not trying to go the Web with a web browser.

But in order to succeed, WordPress needs you to be familiar with the various parts of it and if that’s all, you’re going to need some background info!

We’re actually all interested in WordPress on a lot of levels.

So, here’s the part where we start…

Creating a website

For WordPress, you’re going to be using the web for a couple of things.

The first thing you’ll want to do…

Create websites

So, you’ve probably read how the web can be an intimidating tool for beginners.

And so, we want to take you through what it really is.

Our site is a very basic web page.

So, how do we define a site and what do we do with it?

Well, it starts with making an appearance.

It’s called an email address

A website addresses are very powerful because it tells us the number of page views and the total number of views for an email.

Let’s start by getting you to get a feeling of how well this page works.

So what’s the deal with a website?

Let’s take a little detour

Now that you’ve got that impression about the website and get to your first point, we’ll start to explain more about how it functions.

How a website works?

Let’s first get you started with a simple HTML page .

We’ll start with defining our website at a basic level. Let’s go ahead, for the sake of simplicity we’ll take all of the components from the HTML below:

</div>

“example”

</script>

“http://the.example.com/?page=example.svg”

</a>

<page>

</page>

</body>

OK lets make an intro to a WordPress tutorial out of some of the text above to see how it can help prime the pump of writing….

If you love WordPress you are really going to like this post 🙂

You might be wondering, how do I get started making a WordPress site? You’ve probably read how WordPress can be an intimidating tool for beginners. But, in order to succeed, you need to be familiar with the various parts of WordPress. You’re going to need some background info!

So what’s the deal with a website? Let’s start by getting a feeling of how building a page works by looking at how webpages work in the first place. We will take a little detour and get you started with a simple HTML page.

…it is entirely possible to get recursive and keep feeding the model text as you work along in the tutorial to create more ideas to propel the creative process forward. I might go into this further in the future and produce an example. But for I post a few cuts from model produced text below to show off the concept.

How do I get started with WordPress?

Once you have your free, easy to use WordPress site up and running for your WordPress account, you can find some tips and tricks to get started with the site you want to build your own plugin and set up in your WordPress application.

..I liked the line “How do I get started with WordPress?” so I fed that into the model and next thing you know I get something that could be a start to the next section of a tutorial…

Model prompt >>> How do I get started with WordPress?
======================================== SAMPLE 1 ========================================

Step 1 – Create your site

Once you have your site, get a copy of the plugin. I use WP Engine as my plugin manager. Once you are installed, you should see a page with WordPress: Setup – Configuration for your content page. After you have done that, click Create a new page for your newly created page. You should see the page with WordPress: Setup – Your site. Click Next to see the configuration section. In the next step, you will create the page as if your site was created with WordPress.

Step 2 – Update your site

Update your site in the WordPress dashboard. If it is installed from the local system of your WordPress installation, your WordPress will be updated. You can click Next to see updates for your site as they were made.

Step 3 – Set up your site

Set up your site by creating a WordPress account, signing in to the WordPress account, setting up your website with WordPress, etc.

Step 4 – Creating your website

Go to the Settings page and click Create Website. You can now create a new page under your WordPress profile. Select your new URL. In this section, you can check if this should work. If not, then you will need to click on the Advanced Button to make an update. Once you have checked if your site has been updated, click Continue. To finish, select Update Site and make a new change to your WordPress site.

Step 5 – Register

Once you have registered with WordPress, sign up for the My Online Account Program, which allows WordPress websites to be easily accessed from any web browser. Register your account with your web hosting provider (AWS) and you should see the instructions below.

To begin, click on the Registration Check button. If it is still not there, then you will need to click on the “New Username” option or click on the Search bar that appears.

Click on the Search button in one of your applications. It will take you to the section on registration.

Select your domain name. Click OK to confirm that you are registered, it will pop in.

In both the Name and Contact fields below, click Add domain to Register and click the Registration Button, check Yes or OK to confirm that you should see a confirmation page.

Once you have checked in on your domain name, the web browser will ask you to enter your password. Next, follow these steps to change your existing

… almost the end for now.

Finally, How about a 3rd Try

Now the model goes way off topic. My guess is that there is random seeded of the Weights and Biases in the model and they must have been randomized in a way that caused the model to veer off topic right from the start.

<|endoftext|>A few weeks ago, I published an article detailing the first round of the 2016 American College Football Championship. The article was titled “Top 10 Players in Division I College Football,” and it’s still a long walk from having the best-of-the-class selection process of any college in college football. It’s an interesting concept, as it’s a way into the future. However, a lot of people may still think the same thing I’m saying, and I should think twice.

After seeing the rankings at my own site, I decided to share a few details on the 2017 football draft. The Top 10 prospects in Division I, along with a few other players, are included so that this post will highlight what makes the best players in college football.

The top 11 prospects in 2017

I’m making it sound as if our top class could be considered the top 5 in college football now. That’s not really so. I’m sure, we’ll be seeing much of the top 10 players in this Class of 2017 in 2016. The Top 10 players in Division I were ranked by the following: The number of career touchdowns and receiving yards the top 10 plays a season. If you are a member of the College Football Playoff, and you’re a member of Football Bowl Subdivision, I am going to rank you by the following. You are no slouch, but that is, if you rank first in the Big Ten or NFC Division, you might not have the Big Ten/CSN, unless that person, Jim Irsay, is looking at you.

If you are a member of the NCAA, and you’re going to be ranked second or third nationally in each conference, you might not have the highest ranking player in one conference, but maybe a couple of the higher rankings are worth your time and attention. These rankings are based off of two different ways of looking at players going into their careers as Division I college football prospects:

The Football Bowl Subdivision

My current ranking of teams is based on just two different ways:

• Based on how many points the top team is allowed to gain in division I.

• Based on how many points a team is allowed to lose in division I

This is also really well thought out and makes me wonder what the best option (or worse) for the college football players I know in the future may be.

I’m going to list my team’s results in alphabetical order, and I’d like to get

Resources

https://www.lmspulse.com/2019/open-source-artificial-neural-network-gpt/

https://lambdalabs.com/blog/run-openais-new-gpt-2-text-generator-code-with-your-gpu/

eXclusive ORange

A working title for now

Category Archives: Text Generation

Makemore

Fork of makemore

RNN Text Generation Using Tensorflow

Training

trying out min-char-rnn and lstm

Text Generation

char-rnn code

Comparing the Versions

Modifications of min-char-lstm.py

char-rnn – Training Loss and Validation Loss

Lua Torch

Lua Torch torch-rnn 2 layer 1024 dimensions

min-char-lstm-mod-2.py

GPT-2 the Next Level in Text Generation for now at least

Welcome to WordPress!

For Variety Let’s Go Try Again

Finally, How about a 3rd Try

Resources

Siraj Ravel does a good job explain the technology behind OpenAi’s GPT-2 Text Generator

April 2024
M	T	W	T	F	S	S
« Feb
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30