Histogram bin estimator based on minimizing the estimated integrated squared error (ISE). The number of bins is chosen by minimizing the estimated ISE against the unknown true distribution. The ISE is estimated using cross-validation and can be regarded as a generalization of Scott
(x, range)
| 120 | |
| 121 | |
| 122 | def _hist_bin_stone(x, range): |
| 123 | """ |
| 124 | Histogram bin estimator based on minimizing the estimated integrated squared error (ISE). |
| 125 | |
| 126 | The number of bins is chosen by minimizing the estimated ISE against the unknown |
| 127 | true distribution. The ISE is estimated using cross-validation and can be regarded |
| 128 | as a generalization of Scott's rule. |
| 129 | https://en.wikipedia.org/wiki/Histogram#Scott.27s_normal_reference_rule |
| 130 | |
| 131 | This paper by Stone appears to be the origination of this rule. |
| 132 | https://digitalassets.lib.berkeley.edu/sdtr/ucb/text/34.pdf |
| 133 | |
| 134 | Parameters |
| 135 | ---------- |
| 136 | x : array_like |
| 137 | Input data that is to be histogrammed, trimmed to range. May not |
| 138 | be empty. |
| 139 | range : (float, float) |
| 140 | The lower and upper range of the bins. |
| 141 | |
| 142 | Returns |
| 143 | ------- |
| 144 | h : An estimate of the optimal bin width for the given data. |
| 145 | """ # noqa: E501 |
| 146 | |
| 147 | n = x.size |
| 148 | ptp_x = _ptp(x) |
| 149 | if n <= 1 or ptp_x == 0: |
| 150 | return 0 |
| 151 | |
| 152 | def jhat(nbins): |
| 153 | hh = ptp_x / nbins |
| 154 | p_k = np.histogram(x, bins=nbins, range=range)[0] / n |
| 155 | return (2 - (n + 1) * p_k.dot(p_k)) / hh |
| 156 | |
| 157 | nbins_upper_bound = max(100, int(np.sqrt(n))) |
| 158 | nbins = min(_range(1, nbins_upper_bound + 1), key=jhat) |
| 159 | if nbins == nbins_upper_bound: |
| 160 | warnings.warn("The number of bins estimated may be suboptimal.", |
| 161 | RuntimeWarning, stacklevel=3) |
| 162 | return ptp_x / nbins |
| 163 | |
| 164 | |
| 165 | def _hist_bin_doane(x, range): |