MCPcopy
hub / github.com/numpy/numpy / _hist_bin_stone

Function _hist_bin_stone

numpy/lib/_histograms_impl.py:122–162  ·  view source on GitHub ↗

Histogram bin estimator based on minimizing the estimated integrated squared error (ISE). The number of bins is chosen by minimizing the estimated ISE against the unknown true distribution. The ISE is estimated using cross-validation and can be regarded as a generalization of Scott

(x, range)

Source from the content-addressed store, hash-verified

120
121
122def _hist_bin_stone(x, range):
123 """
124 Histogram bin estimator based on minimizing the estimated integrated squared error (ISE).
125
126 The number of bins is chosen by minimizing the estimated ISE against the unknown
127 true distribution. The ISE is estimated using cross-validation and can be regarded
128 as a generalization of Scott's rule.
129 https://en.wikipedia.org/wiki/Histogram#Scott.27s_normal_reference_rule
130
131 This paper by Stone appears to be the origination of this rule.
132 https://digitalassets.lib.berkeley.edu/sdtr/ucb/text/34.pdf
133
134 Parameters
135 ----------
136 x : array_like
137 Input data that is to be histogrammed, trimmed to range. May not
138 be empty.
139 range : (float, float)
140 The lower and upper range of the bins.
141
142 Returns
143 -------
144 h : An estimate of the optimal bin width for the given data.
145 """ # noqa: E501
146
147 n = x.size
148 ptp_x = _ptp(x)
149 if n <= 1 or ptp_x == 0:
150 return 0
151
152 def jhat(nbins):
153 hh = ptp_x / nbins
154 p_k = np.histogram(x, bins=nbins, range=range)[0] / n
155 return (2 - (n + 1) * p_k.dot(p_k)) / hh
156
157 nbins_upper_bound = max(100, int(np.sqrt(n)))
158 nbins = min(_range(1, nbins_upper_bound + 1), key=jhat)
159 if nbins == nbins_upper_bound:
160 warnings.warn("The number of bins estimated may be suboptimal.",
161 RuntimeWarning, stacklevel=3)
162 return ptp_x / nbins
163
164
165def _hist_bin_doane(x, range):

Callers

nothing calls this directly

Calls 3

_ptpFunction · 0.70
maxFunction · 0.50
minFunction · 0.50

Tested by

no test coverage detected

Used in the wild real call sites across dependent graphs

searching dependent graphs…