Category Archives: WordPress

Posts related to information that I have learned about WordPress that might be useful for others.

laptop

Scraping a WordPress site for text to use with char-rnn

As with anything else in life, it is possible to change nothing but yourself.
The first step toward making change is simply to change yourself.

In this quick example the gist of scraping a WordPress website using Linux and Lynx will be shown. Wget is great at scraping from the web but, I have found out that it does not always work well with WordPress sites.

Grab some text

Step one of the process is to grab some text to work with. In the example I tried I grabbed all of the text of the posts of this blog by scraping it with wget. I also used the text of the US Constitution to see what the tools would do with it as well. Generally the more text, the better the machine learning code will be at generating something interesting.

Scraping the posts

Using the command line web browser lynx in a script I was able to download the text of the posts on this site. Initially I thought to use wget. But, I remembered that wget will do a good job downloading static sites and sometimes will not do so well with ones like this one that is created in WordPress.

There is probably a way to loop this code in bash, and increment a counter for the pages. But, being that this is a one time thing, I opted for a quick approach instead of thinking too hard on making a loop.

#!/bin/bash
lynx -dump -nolist http://erick.heart-centered-living.org/ > my-posts.txt
lynx -dump -nolist http://erick.heart-centered-living.org/page/2/ >> my-posts.txt
lynx -dump -nolist http://erick.heart-centered-living.org/page/3/ >> my-posts.txt
lynx -dump -nolist http://erick.heart-centered-living.org/page/4/ >> my-posts.txt
lynx -dump -nolist http://erick.heart-centered-living.org/page/5/ >> my-posts.txt
lynx -dump -nolist http://erick.heart-centered-living.org/page/6/ >> my-posts.txt
lynx -dump -nolist http://erick.heart-centered-living.org/page/7/ >> my-posts.txt
lynx -dump -nolist http://erick.heart-centered-living.org/page/8/ >> my-posts.txt
lynx -dump -nolist http://erick.heart-centered-living.org/page/9/ >> my-posts.txt

This code will output a file that contains all of the text from the posts on this site. Up to Fall of 2018 when I ran it.

Create a hidden WordPress page using bash on the command line

Recently I was searching around looking for a way to create a hidden page on a WordPress site. It is a hosted site, not on wordpress.com. It is on a Linux server to which I have shell access.

Initially I tried using a plugin that I found that hides pages and posts. Plugins, you got to love or hate them. Love then when they work great right out of the box, hate them when they take a long time to troubleshoot.

Rather than waste too much time with the plugin, I went straight to the command line.

Screenshot_2018-04-03_18-28-35-shows-making-hidden-page

It turns out that if you publish a page and then log into the hosting server, make a directory somewhere under your public_html, change directory into it and execute…

 wget -x -nH your-page-url-to-hide-here

 

Screenshot_2018-04-03_18-38-33-draft-or-private-wp-setting
Set to Draft or Private

…then go back it and make the page a draft or under review, so it “disappears” from the menu structure. It will still work as a “cached” HTML page that has been downloaded to the folder that you have created. It will work, pictures and what not that you have loaded in it will be fully functional.

Example of a hidden page

http://erick.heart-centered-living.org/hidden/i-am-a-hidden-page/

Once the original page is put into draft/under review or private mode, it is gone…

http://erick.heart-centered-living.org/i-am-a-hidden-page/

Caveat

I have noticed that caching can get in the way. If your server caches pages, wget may not see the page updated when you make changes. A quick remedy is to set the page to draft/pending review or private, delete the hidden page. I usually use rm -rf from the directory above it and then force it to download the “404” page. Then  you can publish the page re-run wget and it will force it to get the fresh version. Keep note of the size of the file as a hint that it is getting the right one.

Upcoming: Do this with a CGI Script

In an upcoming post, I will cover how to make a CGI script that will allow you to create a hidden page easily without having to use SSH to login to the server.

 

wget options used in this example, from the man page

-x
–force-directories
The opposite of -nd—create a hierarchy of directories, even if
one would not have been created otherwise.  E.g. wget -x
http://fly.srk.fer.hr/robots.txt will save the downloaded file to
fly.srk.fer.hr/robots.txt.

-nH
–no-host-directories
Disable generation of host-prefixed directories.  By default,
invoking Wget with -r http://fly.srk.fer.hr/ will create a
structure of directories beginning with fly.srk.fer.hr/.  This
option disables such behavior.

Wget Resources

https://www.lifewire.com/uses-of-command-wget-2201085

https://www.labnol.org/software/wget-command-examples/28750/

The Ultimate Wget Download Guide With 15 Awesome Examples

http://www.linuxjournal.com/content/downloading-entire-web-site-wget

Configuring Posting via email for WordPress

I am testing out the ability to post via a secret email, this is how I created this post, then edited some more in WP.

I fussed with it for a bit, sending emails and expecting results. I didn’t realize that the email reading for WP has to be stroked. So I put together a cron job to stroke the reading of email periodically (daily for the moment, which seems reasonable) via php
using…

php -q /home/yourcpanelusername/path-to-folder/wp-mail.php

Which didn’t work, initially. I kept getting email via cron which has XML in the body, it is an error with a line at the bottom…

<p>Slow down cowboy, no need to check for new mails so often!</p>

Then I tried this

But manually stroking the email by going to the URL where wp-mail.php lives does kick the email to a post as lists it as pending. This in my mind is not terribly useful. I would prefer to send an email and have not be a pending post as I would like to post from email without needing to login to WP, in other words just post it already.

Mysteriously after experimenting with sending a few posts by email, it started to work. I sm not sure why, but checking the mail daily at 1AM, it either gets the messages, creates a pending post and deletes the copies on the mail server as it should and reports this in a CRON email. Or there are no emails for it and it reports that correctly. After an initial weirdness it has been working fine and as expected for several weeks.

Test WordPress Install

This Blog is currently a test install for me to experiment with. I do a lot with WordPress and I wanted to try out a few things, without disturbing any live sites. I do have a server at home that I am in the process of building. My old server died a few months ago and I’ve got some big plans for the new one, such as running OwnCloud, more details to follow on this!. Eventually that server will be my “sandbox” instead of this Blog page for WP experiments after installing LAMP.

 

I do plan on putting some interesting content on this site as time goes on.