hub / github.com/scrapy/scrapy / extract_links

Method extract_links

scrapy/linkextractors/lxmlhtml.py:261–284 · view source on GitHub ↗

Returns a list of :class:`~scrapy.link.Link` objects from the specified :class:`response <scrapy.http.Response>`. Only links that match the settings passed to the ``__init__`` method of the link extractor are returned. Duplicate links are omitted if the ``unique`` a

(self, response: TextResponse)

Source from the content-addressed store, hash-verified

259	return self.link_extractor._extract_links(args, *kwargs)
260
261	def extract_links(self, response: TextResponse) -> list[Link]:
262	"""Returns a list of :class:`~scrapy.link.Link` objects from the
263	specified :class:`response <scrapy.http.Response>`.
264
265	Only links that match the settings passed to the ``__init__`` method of
266	the link extractor are returned.
267
268	Duplicate links are omitted if the ``unique`` attribute is set to ``True``,
269	otherwise they are returned.
270	"""
271	base_url = get_base_url(response)
272	if self.restrict_xpaths:
273	docs = [
274	subdoc for x in self.restrict_xpaths for subdoc in response.xpath(x)
275	]
276	else:
277	docs = [response.selector]
278	all_links = []
279	for doc in docs:
280	links = self._extract_links(doc, response.url, response.encoding, base_url)
281	all_links.extend(self._process_links(links))
282	if self.link_extractor.unique:
283	return unique_list(all_links, key=self.link_extractor.link_key)
284	return all_links

Callers 15

_requests_to_followMethod · 0.45

parseMethod · 0.45

test_urls_typeMethod · 0.45

test_extract_all_linksMethod · 0.45

test_extract_filter_allowMethod · 0.45

test_extract_filter_allow_with_duplicatesMethod · 0.45

test_extract_filter_allow_with_duplicates_canonicalizeMethod · 0.45

test_extract_filter_allow_no_duplicates_canonicalizeMethod · 0.45

test_extract_filter_allow_and_denyMethod · 0.45

test_extract_filter_allowed_domainsMethod · 0.45

test_extraction_using_single_valuesMethod · 0.45

Calls 5

_extract_linksMethod · 0.95

_process_linksMethod · 0.95

get_base_urlFunction · 0.90

xpathMethod · 0.45

extendMethod · 0.45

Tested by 15

test_urls_typeMethod · 0.36

test_extract_all_linksMethod · 0.36

test_extract_filter_allowMethod · 0.36

test_extract_filter_allow_with_duplicatesMethod · 0.36

test_extract_filter_allow_with_duplicates_canonicalizeMethod · 0.36

test_extract_filter_allow_no_duplicates_canonicalizeMethod · 0.36

test_extract_filter_allow_and_denyMethod · 0.36

test_extract_filter_allowed_domainsMethod · 0.36

test_extraction_using_single_valuesMethod · 0.36

test_nofollowMethod · 0.36

test_restrict_xpathsMethod · 0.36

test_restrict_xpaths_encodingMethod · 0.36