hub / github.com/huggingface/transformers / DebugUnderflowOverflow

Class DebugUnderflowOverflow

src/transformers/debug_utils.py:27–292 · view source on GitHub ↗

This debug class helps detect and understand where the model starts getting very large or very small, and more importantly `nan` or `inf` weight and activation elements. There are 2 working modes: 1. Underflow/overflow detection (default) 2. Specific batch absolute min/max tra

Source from the content-addressed store, hash-verified

25
26
27	class DebugUnderflowOverflow:
28	"""
29	This debug class helps detect and understand where the model starts getting very large or very small, and more
30	importantly `nan` or `inf` weight and activation elements.
31
32	There are 2 working modes:
33
34	1. Underflow/overflow detection (default)
35	2. Specific batch absolute min/max tracing without detection
36
37	Mode 1: Underflow/overflow detection
38
39	To activate the underflow/overflow detection, initialize the object with the model :
40
41	```python
42	debug_overflow = DebugUnderflowOverflow(model)
43	```
44
45	then run the training as normal and if `nan` or `inf` gets detected in at least one of the weight, input or output
46	elements this module will throw an exception and will print `max_frames_to_save` frames that lead to this event,
47	each frame reporting
48
49	1. the fully qualified module name plus the class name whose `forward` was run
50	2. the absolute min and max value of all elements for each module weights, and the inputs and output
51
52	For example, here is the header and the last few frames in detection report for `google/mt5-small` run in fp16
53	mixed precision :
54
55	```
56	Detected inf/nan during batch_number=0
57	Last 21 forward frames:
58	abs min abs max metadata
59	[...]
60	encoder.block.2.layer.1.DenseReluDense.wi_0 Linear
61	2.17e-07 4.50e+00 weight
62	1.79e-06 4.65e+00 input[0]
63	2.68e-06 3.70e+01 output
64	encoder.block.2.layer.1.DenseReluDense.wi_1 Linear
65	8.08e-07 2.66e+01 weight
66	1.79e-06 4.65e+00 input[0]
67	1.27e-04 2.37e+02 output
68	encoder.block.2.layer.1.DenseReluDense.wo Linear
69	1.01e-06 6.44e+00 weight
70	0.00e+00 9.74e+03 input[0]
71	3.18e-04 6.27e+04 output
72	encoder.block.2.layer.1.DenseReluDense T5DenseGatedGeluDense
73	1.79e-06 4.65e+00 input[0]
74	3.18e-04 6.27e+04 output
75	encoder.block.2.layer.1.dropout Dropout
76	3.18e-04 6.27e+04 input[0]
77	0.00e+00 inf output
78	```
79
80	You can see here, that `T5DenseGatedGeluDense.forward` resulted in output activations, whose absolute max value was
81	around 62.7K, which is very close to fp16's top limit of 64K. In the next frame we have `Dropout` which
82	renormalizes the weights, after it zeroed some of the elements, which pushes the absolute max value to more than
83	64K, and we get an overflow.
84

Callers 1

trainMethod · 0.85

Calls

no outgoing calls

Tested by

no test coverage detected