# The bootstrap — or why you should care about uncertainty

As the Bruce brothers explain in their excellent book Practical Statistics for Data Scientists, one easy and effective way to estimate the sampling distribution of a statistic is to draw additional samples (with replacement) from the sample itself and recalculate the statistic for each resample. This procedure is called the bootstrap.

1. Calculate and store the mean (or any other statistic or metric) of the resampled values.
2. Repeat steps 1–2 R times. R is a large number e.g. 10,000.
`def bootstrap(data, R=10000):     means = []    n = len(data)     for i in range(R):        sampled_data = data.sample(n=n, replace=True)        mean = sampled_data.weight.mean()        means.append(mean)     return pd.DataFrame(means, columns=[‘means’])`
`def confidence_intervals(data, confidence_level=0.95):         low_end = (1 — confidence_level) / 2    high_end = 1 — low_end    bottom_percentile = np.round(data.means.quantile(low_end), 2)    top_percentile = np.round(data.means.quantile(high_end), 2)     print(‘The {}% confidence interval is [{}, {}]’.format(        confidence_level * 100, bottom_percentile, top_percentile))`
`for ci in [0.6, 0.7, 0.8, 0.9, 0.95, 0.99]:    confidence_intervals(bootstrap_means, confidence_level=ci)`
`The 60.0% confidence interval is [74.18, 74.74]The 70.0% confidence interval is [74.11, 74.81]The 80.0% confidence interval is [74.04, 74.89]The 90.0% confidence interval is [73.92, 75.01]The 95.0% confidence interval is [73.83, 75.11]The 99.0% confidence interval is [73.61, 75.31]`

Tales about data, statistics, machine learning, visualisation, and much more. By Adrià Luz (@adrialuz) and Sara Gaspar (@sargaspar).

## More from Adrià Luz

Tales about data, statistics, machine learning, visualisation, and much more. By Adrià Luz (@adrialuz) and Sara Gaspar (@sargaspar).