urlwatch - a tool for monitoring webpages for updates

This script is intended to help you watch URLs and get notified (via email or in your terminal) of any changes. The change notification will include the URL that has changed and a unified diff of what has changed.

The script supports the use of a filtering hook function to strip trivially-varying elements of a webpage.

Basic features

Current version: 1.17

2014-08-01: urlwatch 1.17 fixes several bugs reported in the Debian bug tracker, namely: Invalid encoding sent by server, lynx handing for relative URLs and resolving of relative URL filenames.

2014-01-29: urlwatch 1.16 fixes a bug parsing content-encoding headers, includes a new and improved setup.py script for easy installation and adds basic support for e-mail delivery (contributed by Xavier Izard).

2012-08-30: urlwatch 1.15 adds support for optional UTF-8 handling in the html2text function of the "html2txt" helper module. Patch contributed by Slavko.

2011-11-15: urlwatch 1.14 fixes a unicode decoding issue related to the html2txt module.

2011-12-08: If you are experiencing problems with the concurrent page updates, try setting the number of threads to 1. This might make the updates go slower, but according to at least one user, it is more stable this way. YMMV.

urlwatch 1.13 adds support for watching websites that only work with HTTP POST requests. You can add the POST data in URL-encoded form after the website URL in the urls.txt file, separated by a single space. This release also adds support for Python 3.x by providing an appropriate converter script.

For Python versions earlier than 3.2, this release now depends on the "futures" package from PyPI (this module is included in the 3.2 standard library). The usage of futures should reduce the total time needed to watch several URLs, because network requests are sent in parallel, which usually leads to better bandwidth usage.

Python compatibility

urlwatch is compatible with Python 2.x (2.5 and newer) and with Python 3.x. For Python 3, you have to use the included converter script, which will convert the source code to be compatible with Python 3 (by using the 2to3 tool included in Python).

Download

urlwatch is available as package in various Linux distributions for easy installation via the package manager. It also is available as source tarball and via PyPI (pip install urlwatch).

Packages in distributions

Source tarball

You can download the source tarball of urlwatch here:

urlwatch-1.17.tar.gz (2014-08-01)

Old releases

It's not recommended to run an older version than the current one.

Python Package Index

urlwatch is also indexed in the Python Package Index as "urlwatch":

Advanced features

3rd party patches / Contributions

License

urlwatch is released under the terms of the BSD license

Code repository

The Git repository of urlwatch can be found at:

To checkout the code using git, use this command:

git clone git://repo.or.cz/urlwatch.git

How do I..

..watch only an element on a website?

If you are lucky, the element has a "id" attribute (but other attributes work just fine as well) that you can use with the BeautifulSoup library to extract that part of the HTML document:

from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup(data)
data = str(soup.find(id='tisiDocumentBody'))

..watch a remote Git repository for new tags?

urlwatch supports running commands and checking their output using the Pipe symbol (|). You can use this in combination with the Git command to watch a remote repo for new tags (in urls.txt):

|git ls-remote --tags http://github.com/gpodder/gpodder.git

As an alternative, Thomas Dziedzic has written a tool for this specifically in Ruby, it's called tagurit and can be found at https://github.com/gostrc/tagurit.

..get colored diff output on the console?

You can use colordiff to convert normal diffs to colored ones. Just pipe the urlwatch output into colordiff, and you have colored urlwatch output:

urlwatch | colordiff

..watch binary data?

urlwatch is aimed at text-based data. Due to the way encodings and text diffs are handled, arbitrary binary data might not be well supported - this is expected and part of urlwatch's design.

Information about the User-Agent

Since version 1.3, urlwatch now sends a better User-Agent string. More information about this User-agent string can be found on this page.
Thomas Perl · 2014-08-01