Log metrics in a specially formatted way. Under distributed environment this is done only for a process with rank 0. Args: split (`str`): Mode/split name: one of `train`, `eval`, `test` metrics (`dict[str, float]`): The metrics returned from tra
(self, split, metrics)
| 828 | |
| 829 | # Trainer helper method: imported into the Trainer class and used as a method (takes `self` as first argument). |
| 830 | def log_metrics(self, split, metrics): |
| 831 | """ |
| 832 | Log metrics in a specially formatted way. |
| 833 | |
| 834 | Under distributed environment this is done only for a process with rank 0. |
| 835 | |
| 836 | Args: |
| 837 | split (`str`): |
| 838 | Mode/split name: one of `train`, `eval`, `test` |
| 839 | metrics (`dict[str, float]`): |
| 840 | The metrics returned from train/evaluate/predictmetrics: metrics dict |
| 841 | |
| 842 | Notes on memory reports: |
| 843 | |
| 844 | In order to get memory usage report you need to install `psutil`. You can do that with `pip install psutil`. |
| 845 | |
| 846 | Now when this method is run, you will see a report that will include: |
| 847 | |
| 848 | ``` |
| 849 | init_mem_cpu_alloc_delta = 1301MB |
| 850 | init_mem_cpu_peaked_delta = 154MB |
| 851 | init_mem_gpu_alloc_delta = 230MB |
| 852 | init_mem_gpu_peaked_delta = 0MB |
| 853 | train_mem_cpu_alloc_delta = 1345MB |
| 854 | train_mem_cpu_peaked_delta = 0MB |
| 855 | train_mem_gpu_alloc_delta = 693MB |
| 856 | train_mem_gpu_peaked_delta = 7MB |
| 857 | ``` |
| 858 | |
| 859 | **Understanding the reports:** |
| 860 | |
| 861 | - the first segment, e.g., `train__`, tells you which stage the metrics are for. Reports starting with `init_` |
| 862 | will be added to the first stage that gets run. So that if only evaluation is run, the memory usage for the |
| 863 | `__init__` will be reported along with the `eval_` metrics. |
| 864 | - the third segment, is either `cpu` or `gpu`, tells you whether it's the general RAM or the gpu0 memory |
| 865 | metric. |
| 866 | - `*_alloc_delta` - is the difference in the used/allocated memory counter between the end and the start of the |
| 867 | stage - it can be negative if a function released more memory than it allocated. |
| 868 | - `*_peaked_delta` - is any extra memory that was consumed and then freed - relative to the current allocated |
| 869 | memory counter - it is never negative. When you look at the metrics of any stage you add up `alloc_delta` + |
| 870 | `peaked_delta` and you know how much memory was needed to complete that stage. |
| 871 | |
| 872 | The reporting happens only for process of rank 0 and gpu 0 (if there is a gpu). Typically this is enough since the |
| 873 | main process does the bulk of work, but it could be not quite so if model parallel is used and then other GPUs may |
| 874 | use a different amount of gpu memory. This is also not the same under DataParallel where gpu0 may require much more |
| 875 | memory than the rest since it stores the gradient and optimizer states for all participating GPUs. Perhaps in the |
| 876 | future these reports will evolve to measure those too. |
| 877 | |
| 878 | The CPU RAM metric measures RSS (Resident Set Size) includes both the memory which is unique to the process and the |
| 879 | memory shared with other processes. It is important to note that it does not include swapped out memory, so the |
| 880 | reports could be imprecise. |
| 881 | |
| 882 | The CPU peak memory is measured using a sampling thread. Due to python's GIL it may miss some of the peak memory if |
| 883 | that thread didn't get a chance to run when the highest memory was used. Therefore this report can be less than |
| 884 | reality. Using `tracemalloc` would have reported the exact peak memory, but it doesn't report memory allocations |
| 885 | outside of python. So if some C++ CUDA extension allocated its own memory it won't be reported. And therefore it |
| 886 | was dropped in favor of the memory sampling approach, which reads the current process memory usage. |
| 887 |
nothing calls this directly
no test coverage detected