MCPcopy
hub / github.com/scrapy/scrapy / CrawlerRunner

Class CrawlerRunner

scrapy/crawler.py:397–491  ·  view source on GitHub ↗

This is a convenient helper class that keeps track of, manages and runs crawlers inside an already setup :mod:`~twisted.internet.reactor`. The CrawlerRunner object must be instantiated with a :class:`~scrapy.settings.Settings` object. This class shouldn't be needed (since Scra

Source from the content-addressed store, hash-verified

395
396
397class CrawlerRunner(CrawlerRunnerBase):
398 """
399 This is a convenient helper class that keeps track of, manages and runs
400 crawlers inside an already setup :mod:`~twisted.internet.reactor`.
401
402 The CrawlerRunner object must be instantiated with a
403 :class:`~scrapy.settings.Settings` object.
404
405 This class shouldn't be needed (since Scrapy is responsible of using it
406 accordingly) unless writing scripts that manually handle the crawling
407 process. See :ref:`run-from-script` for an example.
408
409 This class provides Deferred-based APIs. Use :class:`AsyncCrawlerRunner`
410 for modern coroutine APIs.
411 """
412
413 def __init__(self, settings: dict[str, Any] | Settings | None = None):
414 super().__init__(settings)
415 if not self.settings.getbool("TWISTED_REACTOR_ENABLED"):
416 raise RuntimeError(
417 f"{type(self).__name__} doesn't support TWISTED_REACTOR_ENABLED=False."
418 )
419 self._active: set[Deferred[None]] = set()
420
421 def crawl(
422 self,
423 crawler_or_spidercls: type[Spider] | str | Crawler,
424 *args: Any,
425 **kwargs: Any,
426 ) -> Deferred[None]:
427 """
428 Run a crawler with the provided arguments.
429
430 It will call the given Crawler's :meth:`~Crawler.crawl` method, while
431 keeping track of it so it can be stopped later.
432
433 If ``crawler_or_spidercls`` isn't a :class:`~scrapy.crawler.Crawler`
434 instance, this method will try to create one using this parameter as
435 the spider class given to it.
436
437 Returns a deferred that is fired when the crawling is finished.
438
439 :param crawler_or_spidercls: already created crawler, or a spider class
440 or spider's name inside the project to create it
441 :type crawler_or_spidercls: :class:`~scrapy.crawler.Crawler` instance,
442 :class:`~scrapy.spiders.Spider` subclass or string
443
444 :param args: arguments to initialize the spider
445
446 :param kwargs: keyword arguments to initialize the spider
447 """
448 if isinstance(crawler_or_spidercls, Spider):
449 raise ValueError(
450 "The crawler_or_spidercls argument cannot be a spider object, "
451 "it must be a spider class (or a Crawler object)"
452 )
453 crawler = self.create_crawler(crawler_or_spidercls)
454 return self._crawl(crawler, *args, **kwargs)

Callers 15

get_crawlerFunction · 0.90
_runnerMethod · 0.90
setup_methodMethod · 0.90
mainFunction · 0.90
mainFunction · 0.90

Calls

no outgoing calls