|
2008 / urlwatch - a tool for monitoring webpages for updates urlwatch - a tool for monitoring webpages for updatesThis script is intended to help you watch URLs and get notified (via email or in your terminal) of any changes. The change notification will include the URL that has changed and a unified diff of what has changed. The script supports the use of a filtering hook function to strip trivially-varying elements of a webpage. Basic features
![]() Current version: 1.14Updated 2011-11-15: urlwatch 1.14 fixes a unicode decoding issue related to the html2txt module. 2011-12-08: If you are experiencing problems with the concurrent page updates, try setting the number of threads to 1. This might make the updates go slower, but according to at least one user, it is more stable this way. YMMV. urlwatch 1.13 adds support for watching websites that only work with HTTP POST requests. You can add the POST data in URL-encoded form after the website URL in the urls.txt file, separated by a single space. This release also adds support for Python 3.x by providing an appropriate converter script. For Python versions earlier than 3.2, this release now depends on the "futures" package from PyPI (this module is included in the 3.2 standard library). The usage of futures should reduce the total time needed to watch several URLs, because network requests are sent in parallel, which usually leads to better bandwidth usage. Python compatibilityurlwatch is compatible with Python 2.x (2.5 and newer) and with Python 3.x. For Python 3, you have to use the included converter script, which will convert the source code to be compatible with Python 3 (by using the 2to3 tool included in Python). DownloadOfficial Debian package (by Franck Joncourt)Package information: http://packages.debian.org/urlwatch If you have sid repositories enabled, you can install urlwatch via: apt-get install urlwatch Source tarballYou can download the source tarball of urlwatch here:
urlwatch-1.14.tar.gz
(2011-11-15)
Old releasesIt's not recommended to run an older version than the current one.
Python Package Indexurlwatch is also indexed in the Python Package Index as "urlwatch": Advanced features
3rd party patches / Contributions
Licenseurlwatch is released under the terms of the BSD license Code repositoryThe Git repository of urlwatch now has a more permanent home over at repo.or.cz/w/urlwatch.git. To checkout the code using git, use this command: git clone git://repo.or.cz/urlwatch.git How do I....watch only an element on a website?If you are lucky, the element has a "id" attribute (but other attributes work just fine as well) that you can use with the BeautifulSoup library to extract that part of the HTML document: from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup(data)
data = str(soup.find(id='tisiDocumentBody'))
..watch a remote Git repository for new tags?
urlwatch supports running commands and checking their output using
the Pipe symbol ( |git ls-remote --tags http://github.com/gpodder/gpodder.git As an alternative, Thomas Dziedzic has written a tool for this specifically in Ruby, it's called tagurit and can be found at https://github.com/gostrc/tagurit. Information about the User-AgentSince version 1.3, urlwatch now sends a better User-Agent string. More information about this User-agent string can be found on this page.Thomas Perl (m at thp io), jabber: thp@jabber.org |
|