This is ttv1 code (Timm tools, version 1).
home | discuss | report bug
Returns true if two lists are not statistically significantly different.
Based on An introduction to the bootstrap by Bradley Efron, 1993, Chapman and Hall, page 220 to 223.
Works via bootstrap sampling
which Effron describes as a variant of the Fisher
permutation test:
The importance of this method is that, unlike standard statistical hypothesis tests, there is no assumption here that the distributions come from some known distribution (e.g. the normal distribution).
import random
def bootstrap(y0,z0, b = 256, conf= 95):
tiny=1e-32 # added to some divisions to stop div zero errors
Return any number 0 to n
def any(n): return random.uniform(0,n)
Return any list item (uses any
).
def one(lst): return lst[ int(any(len(lst))) ]
Return any sample from lst
(uses one
).
def sampleWithReplacement(lst):
return [one(lst) for _ in lst]
The test statistic: comments on the difference between two lists
def testStatistic(y,z):
tmp1 = tmp2 = 0
for y1 in y.all: tmp1 += (y1 - y.mu)**2
for z1 in z.all: tmp2 += (z1 - z.mu)**2
s1 = (tmp1 / (y.n - 1 + tiny))**0.5
s2 = (tmp2 / (z.n - 1 + tiny))**0.5
delta = abs(z.mu - y.mu)
if s1+s2:
delta = delta/((s1/(y.n + tiny) + s2/(z.n + tiny))**0.5)
return delta
A counter class to simplify reasoning about sets of numbers
class num():
def __init__(i,some=[]):
i.sum = i.n = i.mu = 0 ; i.all=[]
for one in some:
i.put(one)
def put(i,x):
i.all.append(x);
i.sum +=x; i.n += 1
i.mu = i.sum/(i.n + tiny)
def __add__(i1,i2):
return num(i1.all + i2.all)
Some set up
y, z = num(y0), num(z0)
x = y + z
tobs = testStatistic(y,z)
Effron recommends adjusting all the populations so they have the same mean
yhat = [y1 - y.mu + x.mu for y1 in y.all]
zhat = [z1 - z.mu + x.mu for z1 in z.all]
Compute the achieved significance level.
asl = tiny
for _ in range(b):
if testStatistic(num(sampleWithReplacement(yhat)),
num(sampleWithReplacement(zhat))) > tobs:
asl += 1/b
The larger the asl
value, the more likely it
is True
that the lsts are the same.
print("bb",asl,conf/100)
return asl > conf/100
Why is the last line above asl > conf/100
and not asl < conf/100
?
This is because of the way Efrom defines asl
(the larger the ASL, the weaker the evidence that the lists
are different). Chapter 16 of his text explains that in more detail.
(Aside: since I am not a very trusting soul, I have coded this up reversing
> with <. When I did, bootstrap
produced the reverse
of the expected results.)
Copyright © 2016,2017 Tim Menzies tim@menzies.us, MIT license v2.
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
Share and enjoy.