MCPcopy
hub / github.com/huggingface/transformers / PreTrainedConfig

Class PreTrainedConfig

src/transformers/configuration_utils.py:125–1294  ·  view source on GitHub ↗

r""" Base class for all configuration classes. Handles a few parameters common to all models' configurations as well as methods for loading/downloading/saving configurations. <Tip> A configuration file can be loaded and saved to disk. Loading the configuration file and using this f

Source from the content-addressed store, hash-verified

123@strict(accept_kwargs=True)
124@dataclass(repr=False)
125class PreTrainedConfig(PushToHubMixin, RotaryEmbeddingConfigMixin):
126 # no-format
127 r"""
128 Base class for all configuration classes. Handles a few parameters common to all models&#x27; configurations as well as
129 methods for loading/downloading/saving configurations.
130
131 <Tip>
132
133 A configuration file can be loaded and saved to disk. Loading the configuration file and using this file to
134 initialize a model does **not** load the model weights. It only affects the model&#x27;s configuration.
135
136 </Tip>
137
138 Class attributes (overridden by derived classes):
139
140 - **model_type** (`str`) -- An identifier for the model type, serialized into the JSON file, and used to recreate
141 the correct object in [`~transformers.AutoConfig`].
142 - **has_no_defaults_at_init** (`bool`) -- Whether the config class can be initialized without providing input arguments.
143 Some configurations requires inputs to be defined at init and have no default values, usually these are composite configs,
144 (but not necessarily) such as [`~transformers.EncoderDecoderConfig`] or [`~RagConfig`]. They have to be initialized from
145 two or more configs of type [`~transformers.PreTrainedConfig`].
146 - **keys_to_ignore_at_inference** (`list[str]`) -- A list of keys to ignore by default when looking at dictionary
147 outputs of the model during inference.
148 - **attribute_map** (`dict[str, str]`) -- A dict that maps model specific attribute names to the standardized
149 naming of attributes.
150 - **base_model_tp_plan** (`dict[str, Any]`) -- A dict that maps sub-modules FQNs of a base model to a tensor
151 parallel plan applied to the sub-module when `model.tensor_parallel` is called.
152 - **base_model_pp_plan** (`dict[str, tuple[list[str]]]`) -- A dict that maps child-modules of a base model to a
153 pipeline parallel plan that enables users to place the child-module on the appropriate device.
154
155 Common attributes (present in all subclasses):
156
157 - **vocab_size** (`int`) -- The number of tokens in the vocabulary, which is also the first dimension of the
158 embeddings matrix (this attribute may be missing for models that don&#x27;t have a text modality like ViT).
159 - **hidden_size** (`int`) -- The hidden size of the model.
160 - **num_attention_heads** (`int`) -- The number of attention heads used in the multi-head attention layers of the
161 model.
162 - **num_hidden_layers** (`int`) -- The number of blocks in the model.
163
164 <Tip warning={true}>
165
166 Setting parameters for sequence generation in the model config is deprecated. For backward compatibility, loading
167 some of them will still be possible, but attempting to overwrite them will throw an exception -- you should set
168 them in a [~transformers.GenerationConfig]. Check the documentation of [~transformers.GenerationConfig] for more
169 information about the individual parameters.
170
171 </Tip>
172
173 Arg:
174 name_or_path (`str`, *optional*, defaults to `""`):
175 Store the string that was passed to [`PreTrainedModel.from_pretrained`] as `pretrained_model_name_or_path`
176 if the configuration was created with such a method.
177 output_hidden_states (`bool`, *optional*, defaults to `False`):
178 Whether or not the model should return all hidden-states.
179 output_attentions (`bool`, *optional*, defaults to `False`):
180 Whether or not the model should returns all attentions.
181 return_dict (`bool`, *optional*, defaults to `True`):
182 Whether or not the model should return a [`~transformers.utils.ModelOutput`] instead of a plain tuple.

Calls

no outgoing calls