Uh oh!
There was an error while loading. Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork 34k
Open
Labels
Description
Documentation
In the Python documentation for urllib.robotparser, the example currently references a page that is no longer available (musi-cal.com). The example code now points to an inactive website:
>>>importurllib.robotparser>>>rp=urllib.robotparser.RobotFileParser() >>>rp.set_url("http://www.musi-cal.com/robots.txt") >>>rp.read() >>>rrate=rp.request_rate("*") >>>rrate.requests3>>>rrate.seconds20>>>rp.crawl_delay("*") 6>>>rp.can_fetch("*", "http://www.musi-cal.com/cgi-bin/search?city=San+Francisco") False>>>rp.can_fetch("*", "http://www.musi-cal.com/") TrueAdditionally, the current robots.txt file at http://www.musi-cal.com/robots.txt contains:
User-agent: * Disallow: /wp-admin/ Allow: /wp-admin/admin-ajax.phpBecause of this, both can_fetch() calls now return True, which doesn't align with the expected output from the example.
Proposed fix:
Update the example in urlib.robotparser.rst to replace the outdated musi-cal.com URL with a valid URL (e.g. https://www.python.org).
I would be happy to work on this issue and put together a PR for the update.
Linked PRs
Metadata
Metadata
Assignees
Labels
Projects
Status
Todo