MediaWiki extension refinery
Diffusion analytics-refinery (master)
README.md
README.md
Refinery
Refinery contains scripts, artifacts, and configuration for WMF's analytics cluster.
Setting up the refinery repository
- Install git-lfs on your system. Debian includes a git-lfs package.
- Make sure the `docopt`` and `dateutil`` Python packages are available on your system.
On Ubuntu systems, you can achiev this by running
` sudo apt-get install python-docopt sudo apt-get install python-dateutil `
- Clone the repository.
You can find the commands to clone the repository at WMF's gerrit.
To clone anonymously, just run
` git clone https://gerrit.wikimedia.org/r/analytics/refinery `
- change to the cloned repository by running
` cd refinery `
- Initialize git-lfs by running
` git lfs install `
- Pull existing artifacts into the repository by running
` git lfs pull `
(Depending on you internet connection, this step may take some time.)
- Add the `refinery/python`` directory to your `PYTHONPATH``.
To add it only in the running shell, you can use
` export PYTHONPATH=/path/to/analytics/refinery/python `
Please refer to your operating system's documentation on how to do this globally.
- Done.
Oozie job naming convention
- Job base names is following directory pattern in the oozie directory, replacing slashes with dashes. For instance webrequest/load/bundle.xml job is named webrequest-load-bundle, and last_access_uniques/daily/coordinator.xml is named last_access_uniques-daily-coord.
- Root job names end either in -bundle or -coord, while children job names end with job parameters separated with dashes.
Content licensed under Creative Commons Attribution-ShareAlike (CC BY-SA) 4.0 unless otherwise noted; code licensed under GNU General Public License (GPL) 2.0 or later and other open source licenses. By using this site, you agree to the Terms of Use, Privacy Policy, and Code of Conduct. · Wikimedia Foundation · Privacy Policy · Code of Conduct · Terms of Use · Disclaimer · CC-BY-SA · GPL