MCPcopy
hub / github.com/scrapy/scrapy / file_path

Method file_path

scrapy/pipelines/files.py:732–757  ·  view source on GitHub ↗
(
        self,
        request: Request,
        response: Response | None = None,
        info: MediaPipeline.SpiderInfo | None = None,
        *,
        item: Any = None,
    )

Source from the content-addressed store, hash-verified

730 return item
731
732 def file_path(
733 self,
734 request: Request,
735 response: Response | None = None,
736 info: MediaPipeline.SpiderInfo | None = None,
737 *,
738 item: Any = None,
739 ) -> str:
740 media_guid = hashlib.sha1(to_bytes(request.url)).hexdigest() # noqa: S324
741
742 # clean it up and look at the path first
743 parsed_url = urlparse_cached(request)
744 media_ext = Path(parsed_url.path).suffix
745
746 # if path has no extension look at the raw URL
747 if media_ext not in mimetypes.types_map:
748 media_ext = Path(request.url).suffix
749
750 # Handles empty and wild extensions by trying to guess the
751 # mime type then extension or default to empty string otherwise
752 if media_ext not in mimetypes.types_map:
753 media_ext = ""
754 media_type = mimetypes.guess_type(request.url)[0]
755 if media_type:
756 media_ext = cast("str", mimetypes.guess_extension(media_type))
757 return f"full/{media_guid}{media_ext}"

Callers 3

media_to_downloadMethod · 0.95
media_downloadedMethod · 0.95
_file_downloadedMethod · 0.95

Calls 2

to_bytesFunction · 0.90
urlparse_cachedFunction · 0.90

Tested by

no test coverage detected