r""" Base class for all configuration classes. Handles a few parameters common to all models' configurations as well as methods for loading/downloading/saving configurations. <Tip> A configuration file can be loaded and saved to disk. Loading the configuration file and using this f
| 123 | @strict(accept_kwargs=True) |
| 124 | @dataclass(repr=False) |
| 125 | class PreTrainedConfig(PushToHubMixin, RotaryEmbeddingConfigMixin): |
| 126 | # no-format |
| 127 | r""" |
| 128 | Base class for all configuration classes. Handles a few parameters common to all models' configurations as well as |
| 129 | methods for loading/downloading/saving configurations. |
| 130 | |
| 131 | <Tip> |
| 132 | |
| 133 | A configuration file can be loaded and saved to disk. Loading the configuration file and using this file to |
| 134 | initialize a model does **not** load the model weights. It only affects the model's configuration. |
| 135 | |
| 136 | </Tip> |
| 137 | |
| 138 | Class attributes (overridden by derived classes): |
| 139 | |
| 140 | - **model_type** (`str`) -- An identifier for the model type, serialized into the JSON file, and used to recreate |
| 141 | the correct object in [`~transformers.AutoConfig`]. |
| 142 | - **has_no_defaults_at_init** (`bool`) -- Whether the config class can be initialized without providing input arguments. |
| 143 | Some configurations requires inputs to be defined at init and have no default values, usually these are composite configs, |
| 144 | (but not necessarily) such as [`~transformers.EncoderDecoderConfig`] or [`~RagConfig`]. They have to be initialized from |
| 145 | two or more configs of type [`~transformers.PreTrainedConfig`]. |
| 146 | - **keys_to_ignore_at_inference** (`list[str]`) -- A list of keys to ignore by default when looking at dictionary |
| 147 | outputs of the model during inference. |
| 148 | - **attribute_map** (`dict[str, str]`) -- A dict that maps model specific attribute names to the standardized |
| 149 | naming of attributes. |
| 150 | - **base_model_tp_plan** (`dict[str, Any]`) -- A dict that maps sub-modules FQNs of a base model to a tensor |
| 151 | parallel plan applied to the sub-module when `model.tensor_parallel` is called. |
| 152 | - **base_model_pp_plan** (`dict[str, tuple[list[str]]]`) -- A dict that maps child-modules of a base model to a |
| 153 | pipeline parallel plan that enables users to place the child-module on the appropriate device. |
| 154 | |
| 155 | Common attributes (present in all subclasses): |
| 156 | |
| 157 | - **vocab_size** (`int`) -- The number of tokens in the vocabulary, which is also the first dimension of the |
| 158 | embeddings matrix (this attribute may be missing for models that don't have a text modality like ViT). |
| 159 | - **hidden_size** (`int`) -- The hidden size of the model. |
| 160 | - **num_attention_heads** (`int`) -- The number of attention heads used in the multi-head attention layers of the |
| 161 | model. |
| 162 | - **num_hidden_layers** (`int`) -- The number of blocks in the model. |
| 163 | |
| 164 | <Tip warning={true}> |
| 165 | |
| 166 | Setting parameters for sequence generation in the model config is deprecated. For backward compatibility, loading |
| 167 | some of them will still be possible, but attempting to overwrite them will throw an exception -- you should set |
| 168 | them in a [~transformers.GenerationConfig]. Check the documentation of [~transformers.GenerationConfig] for more |
| 169 | information about the individual parameters. |
| 170 | |
| 171 | </Tip> |
| 172 | |
| 173 | Arg: |
| 174 | name_or_path (`str`, *optional*, defaults to `""`): |
| 175 | Store the string that was passed to [`PreTrainedModel.from_pretrained`] as `pretrained_model_name_or_path` |
| 176 | if the configuration was created with such a method. |
| 177 | output_hidden_states (`bool`, *optional*, defaults to `False`): |
| 178 | Whether or not the model should return all hidden-states. |
| 179 | output_attentions (`bool`, *optional*, defaults to `False`): |
| 180 | Whether or not the model should returns all attentions. |
| 181 | return_dict (`bool`, *optional*, defaults to `True`): |
| 182 | Whether or not the model should return a [`~transformers.utils.ModelOutput`] instead of a plain tuple. |
no outgoing calls