MCPcopy
hub / github.com/huggingface/transformers / DebugUnderflowOverflow

Class DebugUnderflowOverflow

src/transformers/debug_utils.py:27–292  ·  view source on GitHub ↗

This debug class helps detect and understand where the model starts getting very large or very small, and more importantly `nan` or `inf` weight and activation elements. There are 2 working modes: 1. Underflow/overflow detection (default) 2. Specific batch absolute min/max tra

Source from the content-addressed store, hash-verified

25
26
27class DebugUnderflowOverflow:
28 """
29 This debug class helps detect and understand where the model starts getting very large or very small, and more
30 importantly `nan` or `inf` weight and activation elements.
31
32 There are 2 working modes:
33
34 1. Underflow/overflow detection (default)
35 2. Specific batch absolute min/max tracing without detection
36
37 Mode 1: Underflow/overflow detection
38
39 To activate the underflow/overflow detection, initialize the object with the model :
40
41 ```python
42 debug_overflow = DebugUnderflowOverflow(model)
43 ```
44
45 then run the training as normal and if `nan` or `inf` gets detected in at least one of the weight, input or output
46 elements this module will throw an exception and will print `max_frames_to_save` frames that lead to this event,
47 each frame reporting
48
49 1. the fully qualified module name plus the class name whose `forward` was run
50 2. the absolute min and max value of all elements for each module weights, and the inputs and output
51
52 For example, here is the header and the last few frames in detection report for `google/mt5-small` run in fp16
53 mixed precision :
54
55 ```
56 Detected inf/nan during batch_number=0
57 Last 21 forward frames:
58 abs min abs max metadata
59 [...]
60 encoder.block.2.layer.1.DenseReluDense.wi_0 Linear
61 2.17e-07 4.50e+00 weight
62 1.79e-06 4.65e+00 input[0]
63 2.68e-06 3.70e+01 output
64 encoder.block.2.layer.1.DenseReluDense.wi_1 Linear
65 8.08e-07 2.66e+01 weight
66 1.79e-06 4.65e+00 input[0]
67 1.27e-04 2.37e+02 output
68 encoder.block.2.layer.1.DenseReluDense.wo Linear
69 1.01e-06 6.44e+00 weight
70 0.00e+00 9.74e+03 input[0]
71 3.18e-04 6.27e+04 output
72 encoder.block.2.layer.1.DenseReluDense T5DenseGatedGeluDense
73 1.79e-06 4.65e+00 input[0]
74 3.18e-04 6.27e+04 output
75 encoder.block.2.layer.1.dropout Dropout
76 3.18e-04 6.27e+04 input[0]
77 0.00e+00 inf output
78 ```
79
80 You can see here, that `T5DenseGatedGeluDense.forward` resulted in output activations, whose absolute max value was
81 around 62.7K, which is very close to fp16's top limit of 64K. In the next frame we have `Dropout` which
82 renormalizes the weights, after it zeroed some of the elements, which pushes the absolute max value to more than
83 64K, and we get an overflow.
84

Callers 1

trainMethod · 0.85

Calls

no outgoing calls

Tested by

no test coverage detected