wordpress
Convert WP to static HTML – part 2
This is a followup to this previous post.
So I’ve been converting some more blogs to static html files, and this time around things seemed to be so different, that I made up a new how to. Here are the steps that I’ve been using to convert blogs using the default Kubric theme.
- Update the permalink structure for the site so that it uses the year, month, day, postname structure.
UPDATE `database`.`prefix_options` SET `option_value` = ‘/%year%/%monthnum%/%day%/%postname%/’ WHERE `prefix_options`.`option_name` = ‘permalink_structure’ LIMIT 1 ;
- Make sure the blog does not block search engines. If the blog is set to block them, wget can only download the index.html file. And this took me a while to figure out. So, for the sake of search engines, if wget only downloads the index.html file or wget recursive gets only index.html file, then remember to check your robots.txt or similar settings. Either edit in the admin section (under Settings->Privacy) or via SQL.
UPDATE `database`.`prefix_options` SET `option_value` = ’1′ WHERE `prefix_options`.`option_name` = ‘blog_public’ LIMIT 1 ;
- Add the .htaccess file if not already there, where
/path/to/wordpress/blog/
starts at the URL root, not the absolute file path. So http://sitename.com/path/to/wordpress/blog/ would have the .htaccess file below in the ‘blog’ directory.
RewriteEngine On RewriteBase /path/to/wordpress/blog/ RewriteCond %{REQUEST_FILENAME} !-f RewriteCond %{REQUEST_FILENAME} !-d RewriteRule . /path/to/wordpress/blog/index.php [L] - Get rid of the meta links through the sidebar widget in the admin, or delete the appropriate lines from the theme files (for default Kubric theme edit comments.php, sidebar.php, single.php, footer.php), or see the last step. Delete the code that puts in the search, comments, trackback, rss, and anything in the footer you want out.
- When all is good, run wget to grab the files.
wget --mirror -P blog-static -nH -np -p -k -E --cut-dirs=5 http://sitename.com/blog/
- Rename the blog directory.
mv blog blog-old
- Rename the static directory to be live.
mv blog-static blog
- Copy the images directory from the old theme to the appropriate static directory.
cp -r blog-old/wordpress/wp-content/themes/default/images/ blog/wordpress/wp-content/themes/default/
- Alternative to get rid of unwanted links, etc. Use the find command to find all html files, then use perl to delete the lines. Don’t forget to escape forward slashes in the search field. Unfortunately, this method requires you to do it for every line of code you want to delete. It’s much better to delete the lines out of the theme files. The code below has an unnecessary space in the opening H3 tag so it will render properly.
find . -name \*.html | xargs perl -ni -e 'print unless /< h3>Leave a Reply< \/h3>/'
Also, if you want to just search and replace instead of remove, this handy find and perl one-liner will find and replace text in all html files.
find . -name *.html | xargs perl -p -i'' -e "s/search text here/replace text there/"
The above would search for all the “search text here” phrases in all html files, and replace it with “replace text here”. You can obviously substitute whatever you want in those to places. If you have a ‘/’ (forward slash) character, it will need to be escaped with a ‘\’ (back slash) character. Perl uses the regular regular expression syntax, so look that up if you need help formulating a search and replace structure.
Major Update to Multiple SVN WordPress Installs
It’s been a while, but I thought it important to post an update to the wpupdate program I wrote to upgrade a whole mess of WordPress installs at one time. I took a cue from the program officially sponsored by WP, but think mine is much, much better.
Here are some of the features:
- Specify a file with a list of svn WP installs, or update the current directory, or specify the directory to update on the command line
- Use command flags or options. You can specify the program to update or switch, use tags or branches.
- Automatically saves a copy of the svn update to a file so your terminal is not overflowing with text, but does output any conflicts that arise.
- Automatically saves a backup copy of the wp-contents directory (just in case the update or switch screws something up).
- Automatically saves a copy of the database, backing up only the tables used by the WP install (based off of the wp-config supplied table prefix).
- Restores permissions to the original owner and group.
Anyhow, check out the new page devoted soley to this application: SVN WordPress Updater
License: Anyone is free to use this program however they want, as long as they give me due attribution. Also if you update or modify the program in any way, I need to know about it. That’s what free and open software is all about. Any updates should benefit us all.
THAT podcast
Check out THAT podcast (THAT = The Humanities And Technology). It’s a new video pod cast put on by a couple of co-workers at CHNM. They interview someone in the technical field about software that helps those of us in the humanities.
The first episode includes an interview with Matt Mullenweg, creator of WordPress (the software running this site!) and shows you how to install and configure ScholarPress (a plug-in to WordPress written by Jeremy Boggs).
It’s great stuff, check it out!
Converting WordPress to static html
UPDATE: Check out the new post on a better way to do this here: http://historicalwebber.mossiso.com/convert-wp-to-static-html-part-2-244.html
Usually people are wanting to convert their static html pages to some dynamic content management system. I’ve run into the issue of needing to go the other way.
A few professors at GMU love to use WordPress for their classes. It’s a really great way to get more student participation and involve some of those who aren’t so talkative in class.
But these blogs are usually only needed for one semester, and then just sit there. This can be a security risk if they are not kept up to date, and is cumbersome when trying to update many of them (one professor had over 30 blogs!).
Sometimes the content should still be viewable, but the need for a whole cms type back-end no longer exists. Sometimes the professor would just like a copy of the pages for their own future research or whatever.
So, I figured out a way to convert a dynamic WordPress site into static html pages.
Here are the basic steps I used:
- Change the permalink structure in the WordPress admin section. Alternatively, directly in the database change wp_options.permalink_structure.option_value to “/%postname%.html”.
UPDATE `database`.`prefix_options` SET `option_value` = ‘/%year%/%monthnum%/%day%/%postname%/’ WHERE `prefix_options`.`option_name` = ‘permalink_structure’ LIMIT 1 ;
UPDATE (2.12.08): Reading a post from Christopher Price (who linked to this post) about WP permalinks, I’m thinking using this structure (/archives/%post_id%.html) might afford the best results. I often found a page that displayed the raw HTML instead of being rendered. This just might fix that issue.
UPDATE (3.11.08): I did some more dynamic to static conversions today, and found out the best permalink structure to use is just the post name. No extra categories and such. So the best structure to use would be this (/%postname%.html). The benefit is that the every page is unique with a descriptive name for the url (albeit sometime very long), and there are not as many subdirectory issues that arise.
UPDATE (7.17.09): This time around, I have found that the following seems to work best for permalink:
/%year%/%monthnum%/%day%/%postname%/And cleaned up the SQL statement.
- Add the .htaccess to /path/to/wp/ if not already there (where /path/to/wp/ is from http://somedomain.com/path/to/wp/ ). If there already is a .htaccess file and it is set to have permalinks, then you can probably leave it as it is.
RewriteEngine On
RewriteBase /path/to/wp/
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /path/to/wp/index.php [L] - Use wget to copy all of the files as static html files.
wget –mirror –wait=2 -P blogname-static -nH -np -p -k -E –cut-dirs=3 http://sitename.com/path/to/blog/
*** Change –cut-dirs to the appropriate number associated with how many directories are after the domain name. The trailing slash plays a part too. ****
UPDATE (03.11.08): I found that the –cut-dirs doesn’t really do anything this time around.
UPDATE (7.17.09): This time around, I find the following to work best, even the –cut-dirs.wget --mirror -P wpsite-static --cut-dirs=3 -nH -p -k -E https://site.com/path/to/wp/
This has the bonus of making the directory for you, thus negating the make directory step. Make sure to use two dashes and not an em dash.
- Copy the contents of wp-content to save uploaded files, themes, etc. This way copies a lot of unnecessary php files, which could be potentially dangerous, but is really easy if you’re just converting to archive. To remove the security threat, just pick and choose the files you need.
cp -r /path/to/wp/wp-content/* /path/to/static/wp-content/
- Sometimes the files are created with folders in the archives folder. To fix this run the following three commands in the archive folder to fix that up. To get rid of the feed file in all of the directories:
rm -f */feed
To delete all of the now empty direcotries:
find . -type d -exec rmdir ‘{}’ \;To rename the files ###.1 to ###
rename .1 ” `find . -type f -name “*.1″`That’s two single quotes after the first ‘.1′
- move to wp folder. make a backup of database:
mysqldump -u [userfromwp-config.php] -p –opt databasename > databasename.sql
UPDATE (03.11.08): I found I needed to backup just a few tables from a database that contained many copies of wordpress. To do this more easily, I used a little script I wrote earlier to dump tables with a common prefix. This could also work if you just put in the full name of only the tables you wanted to backup.
- move one directory above wp install. make tar backup of old wordpress folder:
tar -cf wordpress.tar wordpress/
- rename the old wordpress folder
mv wordpress wordpress-old
- move the static copy into place
mv static/wordpress/ wordpress/
- test out the site. If it’s totally broke, just delete the wordpress directory and restore the original from the tar file.
- remove the tar file and wordpress-old directory as needed.
UPDATE (03.11.08): I have found that the old ‘rename‘ command [rename .1 '' *.1]only works on the current directory. If you want to do a recursive renaming you have to use the ‘find‘ command. The above code has changed to reflect this.
UPDATE (7.14.09): When the rename with find doesn’t work, it’s probably because the post has comments, so there is a folder with the same name as the post’s filename. In this case, just move the file (with the .1 extension) into the folder of the same name, but change the name of the file to index.html
Search
Categories
Recent Comments
Recent Posts
- History’s equation
- The paper is done.
- Some more changes to the project.
- Gathering the historiography
- Getting my hands dirty
- Switching topics
- Archival Research
- The Mystery of Scholarly Articles Revealed
- The review of the historiographical essay
- Aaarg – finding an historiographical essay
- Changing plans already
- Graduate Research Paper
- Poster Session at the History of Ed
- Multiple PHP Instances With One Apache
- 40th anniversary of the moon landing