hub / github.com/D4Vinci/Scrapling / can_fetch

Method can_fetch

scrapling/spiders/robotstxt.py:43–50 · view source on GitHub ↗

Check if a URL can be fetched according to the domain's robots.txt. :param url: The full URL to check :param sid: Session ID for fetching robots.txt if not yet cached

(self, url: str, sid: str)

Source from the content-addressed store, hash-verified

41	return parser
42
43	async def can_fetch(self, url: str, sid: str) -> bool:
44	"""Check if a URL can be fetched according to the domain's robots.txt.
45
46	:param url: The full URL to check
47	:param sid: Session ID for fetching robots.txt if not yet cached
48	"""
49	parser = await self._get_parser(url, sid)
50	return parser.can_fetch(url, "*")
51
52	async def get_delay_directives(self, url: str, sid: str) -> tuple[Optional[float], Optional[tuple[int, int]]]:
53	"""Return both crawl-delay and request-rate in a single parser lookup.

Callers 15

test_allowed_url_returns_trueMethod · 0.95

test_disallowed_url_returns_falseMethod · 0.95

test_disallowed_subpath_returns_falseMethod · 0.95

test_root_url_is_allowedMethod · 0.95

test_allow_directive_overrides_disallowMethod · 0.95

test_disallow_all_blocks_every_pathMethod · 0.95

test_empty_robots_allows_everythingMethod · 0.95

test_non_200_response_allows_everythingMethod · 0.95

test_fetch_error_allows_everythingMethod · 0.95

test_wildcard_path_patternMethod · 0.95

test_returns_boolMethod · 0.95

test_second_call_same_domain_uses_cacheMethod · 0.95

Calls 1

_get_parserMethod · 0.95

Tested by 15

test_allowed_url_returns_trueMethod · 0.76

test_disallowed_url_returns_falseMethod · 0.76

test_disallowed_subpath_returns_falseMethod · 0.76

test_root_url_is_allowedMethod · 0.76

test_allow_directive_overrides_disallowMethod · 0.76

test_disallow_all_blocks_every_pathMethod · 0.76

test_empty_robots_allows_everythingMethod · 0.76

test_non_200_response_allows_everythingMethod · 0.76

test_fetch_error_allows_everythingMethod · 0.76

test_wildcard_path_patternMethod · 0.76

test_returns_boolMethod · 0.76

test_second_call_same_domain_uses_cacheMethod · 0.76