# Mirroring MediaWiki with Git-Mediawiki and gitolite

From Murphy’s Law we can deduct that Internet failures always come when you least expect them. In my case, the Stratum 0 wiki was offline for a few minutes (only, thankfully!) when I really urgently(1!11) needed to look something up there. If I only had an offline clone of the wiki…

## Enter: Git-Mediawiki

I had already before discovered Git-Mediawiki, which lets you mirror certain or all pages of a MediaWiki instance to a local Git repository. It achieves this by implementing the mediawiki:: remote handler, which lets you configure the URL of the remote MediaWiki instance as a Git remote, and loads the raw revisions from the MediaWiki API everytime you do a git fetch:

$GL_GITCONFIG_KEYS = "remote\.* gitweb\.owner gitweb\.description";  Now I could easily add the corresponding options to my repository setup: repo stratum0-wiki config gitweb.description = "Read-only Git mirror of the Stratum 0 wiki" config remote.origin.url = "mediawiki::https://stratum0.org/mediawiki" config remote.origin.fetch = "+refs/heads/*:refs/remotes/origin/*" config remote.origin.fetchstrategy = "by_rev" RW+ = rohieb R = @all daemon gitweb  Note that I let Git-Mediawiki work with the by_rev fetch strategy, which queries the MediaWiki API for all recent revisions rather than first looking for changed pages and then fetching the revisions accordingly. This is more efficient since I want to import every revision nonetheless. I also found out the hard way (i.e. through print debugging) that adding the remote.origin.fetch option is critical for Git-Mediawiki to work correctly. Then, a simple cron job for the git user (which owns all the gitolite repositories), was created with crontab -e to update the mirror every 30 minutes: # m h dom mon dow command */30 * * * * /home/git/update-stratum0-mediawiki-mirror  The script which does all the work resides in /home/git/update-stratum0-mediawiki-mirror: Note that we cannot simply git-merge the master branch here, because the gitolite repository is a bare repo and git-merge needs a working tree. Therefore, we only fetch new revisions from our MediaWiki remote (which fetches to refs/mediawiki/origin/master), and update the master branch manually. Since the mirror is read-only and there are no real merges to be done, this is sufficient here. So far, we have a fully working mirror. But since the Stratum 0 wiki has grown to more than 7000 revisions to date, the initial fetch would need a while. To reduce the load on the MediaWiki API, I figured that I could reuse my existing repository on my laptop. ## Re-using a previous Git-Mediawiki repo So before activating the cron job, I pushed my exiting repository to the mirror: ~/stratum0-wiki$ git push rohieb.name master
~/stratum0-wiki$git push rohieb.name refs/mediawiki/origin/master  A test run of the mirror script however was not happy with that and wanted to fetch ALL THE revisions anyway. So it took me another while to find out that for efficiency reasons, Git-Mediawiki stores the corresponding MediaWiki revisions in Git notes under refs/notes/origin/mediawiki. For example: $ git log --notes=refs/notes/origin/mediawiki
commit 7e486fa8a463ebdd177e92689e45f756c05d232f
Author: Daniel Bohrer <Daniel Bohrer@stratum0.org/mediawiki>
Date:   Sat Mar 15 14:42:09 2014 +0000


So after I also pushed refs/notes/origin/mediawiki to the mirror repo, everything was fine and a the cron job only fetched a small amount of new revisions.