Archiving a Wordpress site

12 April 2026
For several years I ran on a pro-bono basis a website for a Ukranian-related charity but for reasons intentionally taken at face-value the trustees decided cease operations and this lead to a request from the charities commission for the website to be taken down. However before terminating the hosting completely I wanted to have a verified archive of the site, which meant converting the downloaded Wordpress data-files into a static off-line copy. The procedure is based on a previous Wordpress site recovery but that write-up felt rather lacking on critical details so decided to do a much better job of documenting what was required to do a Wordpress site archival.

Backing up the site data

A Wordpress backup consists of two parts: The PHP files that constitute the Wordpress code alongside various other resource files such as images; and a database dump that contains things like post text. The website itself was hosted on DreamHost who specialise in Wordpress hosting and they provide multiple methods for downloading a copy of the Wordpress data but they all boil down to zipping/tarballing the Wordpress files and doing a dump of the database. The wp-cli tool does a good job of taking the guess-work out of the parameters needed for the database dump so that method was used for extracting the data.

Internal server setup

The overall plan is to get the site up and running on an private server instance, both to check that the archive is good by making it “live” on a new system and in order to create a static copy of the site. A place-holder of https://www.example.org is used so this will need to be changed to the domain that the site used while it was live and it is assumed that /var/html will be the place on the server filesystem where Wordpress will be reinstalled.

MariaDB/MySQL database setup

A lot of the time when MySQL is mentioned the drop-in replacement MariaDB is actually being used, and as a result here the MariaDB commands are used rather than the MySQL compatibility symlinks. The MySQL equivilants are similarly named, with the mariadb prefix replaced with mysql and underscore replaced with dashes. Firstly create new database files and manually start the database server:

mariadb-install-db mariadbd-safe

Then in another window run the following as root to setup the database:

mariadb-secure-installation

Once this is done shut down mariadbd-safe and start up the server in normal background mode. For non-Slackware systems replace usage of rc.mysqld with the approoriate SysV or SystemD command(s). You might need to use chown -R mysql:mysql /var/lib/mysql/* if there are any errors with permissions.

mariadb-admin shutdown chmod +x /etc/rc.d/rc.mysqld /etc/rc.d/rc.mysqld start

From now on commands can be run as non-root users, so login to create Wordpress user and its associated database:

mariadb -u root -p

Once logged in run the following query, changing the place-holder sekret to a properly secure password:

CREATE DATABASE wordpress; GRANT ALL PRIVILEGES ON wordpress.* TO 'wordpress'@'localhost' IDENTIFIED BY 'sekret'; FLUSH PRIVILEGES; quit

..and finally test that everything for this new user/database works fine:

mariadb -u wordpress -p wordpress

PHP FPM setup

The following is a very minimalist PHP-FPM configuration with all comments and default values stripped out. Main thing to note is use of Unix socket /tmp/php.sock for the FastCGI communication.

[global] pid = run/php-fpm.pid error_log = log/php-fpm.log log_level = warning process.max = 16 [www] user = nobody group = nobody listen = /tmp/php.sock listen.mode = 0666 pm = dynamic pm.max_children = 5 pm.start_servers = 2 pm.min_spare_servers = 1 pm.max_spare_servers = 3

Reroute domain lookups to localhost

In Wordpress changing the base URL is an almighty pain up the arse because things like posts embed the full address verbatim instead of using relative links. So instead of trying to fix up an alternative address, set things up so that the domain is routed to the local machine. The easiest way to do this is to edit /etc/hosts which once done should look something like the following:

127.0.0.1 localhost www.example.org ::1 localhost 192.168.1.24 Strongbox.vm Strongbox

Create a make-shift site certificate

Like above changing wordpress to use http:// instead of https:// is an almighty pain, and since DreamHost manage site certificates themselves at least in my case was not able to get a copy of the production site certificate private key. Since SSL will not operate without then the most expedient things to do is create a self-signed certificate for the domain using the commands below. The certificate will not be properly valid but for the purpose here there is no real need for a properly signed trust chain, and while web browsers will complain they all in the end provide a way to bypass the warnings.

openssl genrsa -out /etc/nginx/cert.key openssl req -x509 -new -nodes -key /etc/nginx/cert.key -out /etc/nginx/cert.pem -subj "/CN=www.exmple.org"

Nginx setup

Next a minimalist Nginx web-server setup is required. Main points are using the Unix socket /tmp/php.sock from the PHP-FPM setup, the use of /var/html as the web-site root where the Wordpress files will be extracted to, and finally the use of the try_files directive. The latter being missing was a major source of grief because while the Wordpress site home page and admin pages of the form xxx.php?.... work fine, other pages that were of the form https://www.example.org/contact got 404 errors because they do not correspond to an actual file or directory. Using try_files reroutes requests as needed.

worker_processes 1; events { worker_connections 1024; } http { include mime.types; default_type application/octet-stream; sendfile on; keepalive_timeout 65; server { listen 80; listen 443 ssl; ssl_certificate_key /etc/nginx/cert.key; ssl_certificate /etc/nginx/cert.pem; client_max_body_size 10M; server_name www.frombristolwithlove.org; location / { autoindex on; root /var/html; index index.php; try_files $uri $uri/ /index.php?args; location ~* \.php$ { fastcgi_split_path_info ^(.+?\.php)(/.*)$; fastcgi_pass unix:/tmp/php.sock; fastcgi_index index.php; fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name; include fastcgi_params; } } } }

In order that PHP itself is working create a test.php with the following content:

<?php phpinfo(); ?>

Point a web browser at http://localhost/test.php and if everything is setup fine the page should show a table with the PHP daemon configuration. Then try ‘https://www.example.org’ to make sure the redirecting of the domain to localhost and SSL is also working, although there will be certificate warnings which will need to be bypassed.

Unarchiving the site

Time to restore the backed-up Wordpress site, the first stage of which is to load the database data:

zcat Wordpress-example_com.sql.gz | mariadb -u wordpress -p wordpress

The next stage is to extract the files to /var/html and move them into the right place:

tar -C /var/html -xf Wordpress-example_com.tar.xz mv /var/html/example.org/* /var/html/

From here the unarchived Wordpress configuration needs to be changed to reflect the local server installation rather than the previous hosting, which in short means changing the values within this section of /var/html//wp-config.php where the placeholder sekret is the actual password setup earlier.

define('DB_NAME', 'wordpress'); /** MySQL database username */ define('DB_USER', 'wordpress'); /** MySQL database password */ define('DB_PASSWORD', 'sekret'); /** MySQL hostname */ define('DB_HOST', 'localhost');

At this point browsing to ‘https://www.example.org’ should show a live copy of the Wordpress site. If instead there are server errors the most likley problem is needing to fix up file permissions under /var/html.

Taking a static snapshot

There are various plugins that supposedly generate a static copy of a Wordpress site with a minimal amount of clicking but my experience of them is never working advertised and it is not something I have the patience to work out why. While I am sure it is not the best tool for the job wget used in recursive-download mode does a reasonably good effort of the task when using the following command, and for me the results are good enough.

wget \ --recursive \ --page-requisites \ --adjust-extension \ --convert-links \ --level inf \ --execute robots=off

This is a recursive download that follows intra-site links to unlimited depth and grabs things like images in the process, fixing things up for local viewing by making links relative and giving pages a .html prefix. It also bypasses any robots file restrictions. The more compact form below does the same thing:

wget -rpEk -l inf -e robots=off

Everything needed to properly view the website is local so there is no need to enable any off-site fetches. Hammering a private server is also not an issue so no rate-limiting is enabled.

Remarks

The previous Wordpress recovery done a few years ago was an ad-hoc panic job at a bad time of my life whereas the archival which lead to this article was a planned undertaking done when it best suited myself. It is one of several Wordpress website I had been running since the start of the Ukrainian invasion but for whatever reasons people decided to move on and with nothing lasting forever I felt that they had achieved their objectives, but after being run for so long there was at least the idea of having them in a form that could still be accessed. Using DreamHost was an experiment since in the past my experience with shared hosting has almost universally been abysmal and as a result usually opted for hosting sites on VPS systems, but for the site I was archiving they were everything that was needed.