MCPcopy
hub / github.com/scrapy/scrapy / extract_links

Method extract_links

scrapy/linkextractors/lxmlhtml.py:261–284  ·  view source on GitHub ↗

Returns a list of :class:`~scrapy.link.Link` objects from the specified :class:`response <scrapy.http.Response>`. Only links that match the settings passed to the ``__init__`` method of the link extractor are returned. Duplicate links are omitted if the ``unique`` a

(self, response: TextResponse)

Source from the content-addressed store, hash-verified

259 return self.link_extractor._extract_links(*args, **kwargs)
260
261 def extract_links(self, response: TextResponse) -> list[Link]:
262 """Returns a list of :class:`~scrapy.link.Link` objects from the
263 specified :class:`response <scrapy.http.Response>`.
264
265 Only links that match the settings passed to the ``__init__`` method of
266 the link extractor are returned.
267
268 Duplicate links are omitted if the ``unique`` attribute is set to ``True``,
269 otherwise they are returned.
270 """
271 base_url = get_base_url(response)
272 if self.restrict_xpaths:
273 docs = [
274 subdoc for x in self.restrict_xpaths for subdoc in response.xpath(x)
275 ]
276 else:
277 docs = [response.selector]
278 all_links = []
279 for doc in docs:
280 links = self._extract_links(doc, response.url, response.encoding, base_url)
281 all_links.extend(self._process_links(links))
282 if self.link_extractor.unique:
283 return unique_list(all_links, key=self.link_extractor.link_key)
284 return all_links

Calls 5

_extract_linksMethod · 0.95
_process_linksMethod · 0.95
get_base_urlFunction · 0.90
xpathMethod · 0.45
extendMethod · 0.45