hub / github.com/scrapy/scrapy / CrawlerRunner

Class CrawlerRunner

scrapy/crawler.py:397–491 · view source on GitHub ↗

This is a convenient helper class that keeps track of, manages and runs crawlers inside an already setup :mod:`~twisted.internet.reactor`. The CrawlerRunner object must be instantiated with a :class:`~scrapy.settings.Settings` object. This class shouldn't be needed (since Scra

Source from the content-addressed store, hash-verified

395
396
397	class CrawlerRunner(CrawlerRunnerBase):
398	"""
399	This is a convenient helper class that keeps track of, manages and runs
400	crawlers inside an already setup :mod:`~twisted.internet.reactor`.
401
402	The CrawlerRunner object must be instantiated with a
403	:class:`~scrapy.settings.Settings` object.
404
405	This class shouldn't be needed (since Scrapy is responsible of using it
406	accordingly) unless writing scripts that manually handle the crawling
407	process. See :ref:`run-from-script` for an example.
408
409	This class provides Deferred-based APIs. Use :class:`AsyncCrawlerRunner`
410	for modern coroutine APIs.
411	"""
412
413	def __init__(self, settings: dict[str, Any] \| Settings \| None = None):
414	super().__init__(settings)
415	if not self.settings.getbool("TWISTED_REACTOR_ENABLED"):
416	raise RuntimeError(
417	f"{type(self).__name__} doesn't support TWISTED_REACTOR_ENABLED=False."
418	)
419	self._active: set[Deferred[None]] = set()
420
421	def crawl(
422	self,
423	crawler_or_spidercls: type[Spider] \| str \| Crawler,
424	*args: Any,
425	**kwargs: Any,
426	) -> Deferred[None]:
427	"""
428	Run a crawler with the provided arguments.
429
430	It will call the given Crawler's :meth:`~Crawler.crawl` method, while
431	keeping track of it so it can be stopped later.
432
433	If ``crawler_or_spidercls`` isn't a :class:`~scrapy.crawler.Crawler`
434	instance, this method will try to create one using this parameter as
435	the spider class given to it.
436
437	Returns a deferred that is fired when the crawling is finished.
438
439	:param crawler_or_spidercls: already created crawler, or a spider class
440	or spider's name inside the project to create it
441	:type crawler_or_spidercls: :class:`~scrapy.crawler.Crawler` instance,
442	:class:`~scrapy.spiders.Spider` subclass or string
443
444	:param args: arguments to initialize the spider
445
446	:param kwargs: keyword arguments to initialize the spider
447	"""
448	if isinstance(crawler_or_spidercls, Spider):
449	raise ValueError(
450	"The crawler_or_spidercls argument cannot be a spider object, "
451	"it must be a spider class (or a Crawler object)"
452	)
453	crawler = self.create_crawler(crawler_or_spidercls)
454	return self._crawl(crawler, args, *kwargs)

Callers 15

get_crawlerFunction · 0.90

test_spider_manager_verify_interfaceMethod · 0.90

test_crawler_runner_accepts_dictMethod · 0.90

test_crawler_runner_accepts_NoneMethod · 0.90

_runnerMethod · 0.90

test_crawler_runner_asyncio_enabled_trueMethod · 0.90

setup_methodMethod · 0.90

test_crawlerrunner_accepts_crawlerMethod · 0.90

test_load_spider_module_from_addonsMethod · 0.90

test_crawler_runner_loadingMethod · 0.90

mainFunction · 0.90

Calls

no outgoing calls

Tested by 10

get_crawlerFunction · 0.72

test_spider_manager_verify_interfaceMethod · 0.72

test_crawler_runner_accepts_dictMethod · 0.72

test_crawler_runner_accepts_NoneMethod · 0.72

_runnerMethod · 0.72

test_crawler_runner_asyncio_enabled_trueMethod · 0.72

setup_methodMethod · 0.72

test_crawlerrunner_accepts_crawlerMethod · 0.72

test_load_spider_module_from_addonsMethod · 0.72

test_crawler_runner_loadingMethod · 0.72