MCPcopy
hub / github.com/tornadoweb/tornado / get_links_from_url

Function get_links_from_url

demos/webspider/webspider.py:16–27  ·  view source on GitHub ↗

Download the page at `url` and parse it for links. Returned links have had the fragment after `#` removed, and have been made absolute so, e.g. the URL 'gen.html#tornado.gen.coroutine' becomes 'http://www.tornadoweb.org/en/stable/gen.html'.

(url)

Source from the content-addressed store, hash-verified

14
15
16async def get_links_from_url(url):
17 """Download the page at `url` and parse it for links.
18
19 Returned links have had the fragment after `#` removed, and have been made
20 absolute so, e.g. the URL 'gen.html#tornado.gen.coroutine' becomes
21 'http://www.tornadoweb.org/en/stable/gen.html'.
22 """
23 response = await httpclient.AsyncHTTPClient().fetch(url)
24 print("fetched %s" % url)
25
26 html = response.body.decode(errors="ignore")
27 return [urljoin(url, remove_fragment(new_url)) for new_url in get_links(html)]
28
29
30def remove_fragment(url):

Callers 1

fetch_urlFunction · 0.85

Calls 3

remove_fragmentFunction · 0.85
get_linksFunction · 0.85
fetchMethod · 0.45

Tested by

no test coverage detected