The goal of this course is to expose students to some of the controversies in statistics and data analysis. At the undergraduate level we talk a good deal about
The course will likely meet every other week for 120 minutes for discussion on the week’s reading(s). The next week’s readings will be assigned at the end of Before each class meeting I will distribute to you a set of questions that you must answer and bring completed to class. Most readings will be on reserve in the PQRC. For the books listed below we will, in most cases, read excerpts from each book.
There is not a required text for this course. We will be reading exerpts from several books and one of those you will be required to read in its entirety and to write a review. We will draw our readings from the following list:
For all of these I will provide exerpts and/or make the book available in the Peterson Quantitative Resource Center. Your final project will be to select one of these books and to write a review of it.
Attendance is mandatory for this course. Much of the course will be discussion of the readings and so your attendance and participation in those discussions will be crucial.
We will set a time to meet that works with everyone’s schedule. If you are not able to make a meeting, please let me know well in advance.
If you do not attend a class you will not receive credit for your reaction paper due that meeting
Before each meeting I will post a set of articles and a set of questions to those as an Rmarkdown file in our course folder on the SLU Rstudio server.
For your final project/paper, you will select one of the books from which we have read an excerpt and read the whole book. You will write a 3 to 5 page review of the book discussing the main themes of the book and how those themes related to the other themes we discussed in the course. Be sure to talk about what are the strong and weak part of the book. You may select a book not from the list above with permission of the instructor.
If you have a disability and need accommodations please be sure to contact the Student Accessibility Services Office (315.229.5537) right away so they can help you get the accommodations you require. If you will need to use any accommodations in this class, please talk with me early so you can have the best possible experience this semester. Although not required, I would like to know of any accommodations that are needed at least 10 days before a quiz or test, so please see me soon. For more specific information visit the Student Accessibility Services website: https://www.stlawu.edu/student-accessibility-services
Please note that students that are eligible for and intend to use their extended exam time accommodations will need to arrange to take the exam in the Student Accessibility Services Office. (Arrangements should be initiated at least one week in advance and the student should contact me, via email, one to two days beforehand to ensure arrangements have been made on my end too.)
A percentage grade will be determined based on problem sets, quizzes, and exams.
| Source | Percent |
|---|---|
| Readings | 80 |
| Book Review | 20 |
Academic dishonesty will not be tolerated. Any specific policies for this course are supplementary to the Honor Code. According to the St. Lawrence University Academic Honor Policy,
It is assumed that all work is done by the student unless the instructor/mentor/employer gives specific permission for collaboration.
Cheating on examinations and tests consists of knowingly giving or using or attempting to use unauthorized assistance during examinations or tests.
Dishonesty in work outside of examinations and tests consists of handing in or presenting as original work which is not original, where originality is required.
The following constitute examples of academic dishonesty:
Plagiarism: Presenting as one’s own work the work of another person–words, ideas, data, evidence, thoughts, information, organizing principles, or style of presentation–without proper attribution. Plagiarism includes paraphrasing or summarizing without acknowledgment by quotation marks, footnotes, endnotes, or other indices of reference (cf. Joseph F. Trimmer, A Guide to MLA Documentation).
Handing in or presenting false reports on any experiment.
Handing in or presenting a book report on a book one has not read.
Falsification of records.
Supplying information to another student knowing that such information will be used in a dishonest way.
Submission of or presentation of work (papers, journal abstracts, oral presentations, etc.) which has received credit in a previous course to satisfy the requirement(s) of a second course without the knowledge and permission of the instructor/supervisor/mentor of the second course.
Knowingly making false statements in support of requests for special consideration or special timing in the fulfillment of course requirements.
Claims of ignorance and academic or personal pressure are unacceptable as excuses for academic dishonesty. Students must learn what constitutes one’s own work and how the work of others must be acknowledged.
For more information, refer to www.stlawu.edu/acadaffairs/academic_honor_policy.pdf.
To avoid academic dishonesty, it is important that you follow all directions and collaboration rules and ask for clarification if you have any questions about what is acceptable for a particular assignment or exam. If I suspect academic dishonesty, a score of zero will be given for the entire assignment in which the academic dishonesty occurred for all individuals involved and Academic Honor Council will be notified. If a pattern of academic dishonesty is found to have occurred, a grade of 0.0 for the entire course can be given.
You are encouraged to discuss course material, including homework assignments, with your classmates. All work you turn in, however, must be your own. This includes both writing and code. Copying from other students, from books, from websites, or from solutions for previous versions of the class, (1) does nothing to help you learn how to program, (2) is easy for instructors to detect, and (3) has serious negative consequences for you. If, after reading the policy, you are unclear on what is acceptable, please ask me.
The first document I want you to read is Karl Popper’s Science as Falsification (1963) summary of his research.
https://staff.washington.edu/lynnhank/Popper-1.pdf
Questions:
This is a 1990 article by Jacob Cohen who was a Psychology Professor at New York University.
https://pdfs.semanticscholar.org/fa77/0a7fb7c45a59abbc4c2bc7d174fa51e5d946.pdf
Questions:
R. A. Fisher who I mentioned in class last time, gave us the null hypothesis and the possibility of rejecting it. Jerzy Neyman and Egon Pearson added some additional mathematics to this structure via the alternative hypothesis.
I am happy to say that the long neglect of attention to effect size seems to be coming to a close. The clumsy and fundamentally invalid box-score method of literature review based on p- values is being replaced
The Religion of Statistics As Practiced In Medical Journals by David S. Salburg
https://www.jstor.org/stable/2683942
Note you need to be on the SLU campus (or connected via VPN) for this link to work
Questions:
The Earth Is Round (p<0.05) by Jacob Cohen https://pdfs.semanticscholar.org/fa63/cbf9b514a9bc4991a0ef48542b689e2fa08d.pdf
Editorials By David Trafimow and David Trafimow and Michael Marks from 2014 and 2015 respectively https://doi.org/10.1080/01973533.2014.865505
https://doi.org/10.1080/01973533.2015.1012991
Here is a summary of Trafimow’s 2003 article that is cited in the above editorials:
Because the probability of obtaining an experimental finding given that the null hypothesis is true [p(H₀/F)] is not the same as the probability that the null hypothesis is true given a finding [p(H₀/F)], calculating the former probability does not justify conclusions about the latter one. As the standard null-hypothesis significance-testing procedure does just that, it is logically invalid (J. Cohen, 1994). Theoretically, Bayes’s theorem yields [p(H₀/F)], but in practice, researchers rarely know the correct values for 2 of the variables in the theorem. Nevertheless, by considering a wide range of possible values for the unknown variables, it is possible to calculate a range of theoretical values for [p(H₀/F)] and to draw conclusions about both hypothesis testing and theory evaluation.
Questions:
Why is \(P(D \mid H_0) \neq P(H_0 \mid D)\)? What does that mean and why is it relevant here?
On page 997 of Cohen (1994), he says
What’s wrong with NHST (Null Hypothesis Statistical Testing)? Well, among many other things, it does not tell us what we want to know
What is it that Cohen thinks that Psychology, and science more broadly, want to know?
Science Isn’t Broken By Christie Aschwanden (2015)
https://fivethirtyeight.com/features/science-isnt-broken/#part1
Using the Hack Your Way to Scientific Glory' App, find a combination of factors and data that yield apublishable’ results. Report the combination that you used.
List at least three ways that are listed in this article that Peer-Review Based Science is imperfect.
In her article Christie Aschwanden says the state of our science is strong. Do you agree or disagree?
Was the journal Basic and Applied Social Psychology justified in rejecting hypothesis testing and pvalues? Why or Why not
Our next class meeting is February 24 at 530 pm. Please knit and print your answers to the following before class. Although I’ve listed the readings in order, it probably makes some sense to read them all one time before going back and answering all of the questions.
This week we will consider two topics related to the p-values work that you did for the last one. I was planning to start with some issues around p-values and statistical significance and move to something called the ``reproducibility crisis’’. But I found some other issues around p-values.
This is now a classic article from 2005 with an ominous title
Why Most Published Research Findings Are False by John P. A. Ioannidis https://journals.plos.org/plosmedicine/article/file?id=10.1371/journal.pmed.0020124&type=printable
Questions
From the same author as above, here is a proposal to fix p-values or at least the classic reject the null hypothesis if p<0.05.
The Proposal to Lower P Value Thresholds to .005 by John P. A. Ioannidis http://myslu.stlawu.edu/~msch/MakeSignLessThan0005.pdf
Questions
This is a long article by two economists who argue that we really should ignore statistical significance.
The Cult of Statistical Significance By Stephen T. Ziliak and Deirdre N. McCloskey https://www.deirdremccloskey.com/docs/jsm.pdf
Questions
In 2016, the American Statistical Association (ASA) which is the primary professional society for Mathematical Statisticians produced the following document on p-values.
Statement by the American Statistical Association on p-values https://amstat.tandfonline.com/doi/full/10.1080/00031305.2016.1154108#.XGR6d2lOmyo
**Questions
The following are a series of articles on the ``reproducibility crisis’’
What does Research Reproducibility Mean http://stm.sciencemag.org/content/8/341/341ps12/tab-pdf
An example on reproducibility and Psychology https://www.sciencealert.com/two-more-classic-psychology-studies-just-failed-the-reproducibility-test
Is science really facing a reproducibility crisis, and do we need it to? by Daniele Fanelli https://www.pnas.org/content/115/11/2628
The reproducibility crisis in science: A statistical counterattack by Roger Peng https://rss.onlinelibrary.wiley.com/doi/pdf/10.1111/j.1740-9713.2015.00827.x
**Questions
Explain the difference between inferential reproducibility, results reproducibility and methods reproducibility.
Based upon the above, do you think there is a reproducibility crisis in science? Support your answer with evidence from the articles above.
Appendix (for extra fun): Statistical tests,\(P\) values, confidence intervals, and power: a guide to misinterpretations https://link.springer.com/content/pdf/10.1007%2Fs10654-016-0149-3.pdf
Our next class meeting is tentatively March 10 at 530 pm. Please knit and print your answers to the following before class. Although I’ve listed the readings in order, it probably makes some sense to read them all one time before going back and answering all of the questions.
Andrew Gelman is the best statistician I’ve ever met. He writes beautiful stuff like this http://www.stat.columbia.edu/~gelman/research/published/stacking.pdf which you don’t have to read.
But also the following which you do have to read
Gelman and Loken https://www.americanscientist.org/article/the-statistical-crisis-in-science
More Gelman and Loken http://www.stat.columbia.edu/~gelman/research/unpublished/p_hacking.pdf
Gelman is sooo good, perhaps we need a whole day of his work http://www.stat.columbia.edu/~gelman/research/published/jasa_signif_2.pdf
More Gelman https://statmodeling.stat.columbia.edu/2019/01/28/bullshit-asymmetry-principle/ and be sure to read the comments on this.
Still more Gelman https://statmodeling.stat.columbia.edu/2018/11/17/using-numbers-replace-judgment/ which needs the following for context http://www.stat.columbia.edu/~gelman/research/published/Gelman_maintext_proof_toAU.pdf
Which article above was your favorite? Why?
Which of the above ideas do you find most compelling and why?
Which of the above ideas do you find least compelling and why?
Regarding the Bullshit Asymmetry Principle, what are the incentives of the individuals involved and how do you think that plays into this ``principle’’.
From the Gelman and Carlin article, in the final paragraph, they write
… remember that most effects can’t be zero (at least in social science and public health), and that an ``effect’’ is usually a mean in a population (or something similar such as a regression coefficient)…
What are the consequences of this statement for a methodology like ``best subsets’’ regression?
On to Stephen Jay Gould and in this section you will read excerpts from two of his books. He was a paleontologist and evolutionary biologist who passed away in 2002. Most of his work centers on the fossil record for evolution but he often used probabilistic or statistical arguments in his writing.
Readings from Mismeasure of Man
Read Chapter 1 and Chapter 3 of Gould’s Mismeasure of Man
An interview on Full House (which was originally published in 1996) https://www.youtube.com/watch?v=1PJ20WXteRs
Read Chapter http://myslu.stlawu.edu/~msch/GouldFullHouseExcerpt.pdf
From Mismeasure of Man, are physical differences between sexes and races a valid area of scientific research? Why or why not?
George Orwell once said “To see what is right in front of one’s nose requires a constant struggle.” How can we avoid the types of errors described by Gould in Mismeasure of Man? Be as specific as possible.
Given the mental fallacies and shortcuts taken in Mismeasure of Man, why might we believe that ``[t]he best hitters of today can’t be worse than 0.400 hitters of the past’’ (Full House, page 120)?
What does Gould do in the Full House article that makes his article convincing? Cite and explain particularly convincing and unconvincing part of his argument.
Appendix (for extra fun): Statistical tests,\(P\) values, confidence intervals, and power: a guide to misinterpretations https://link.springer.com/content/pdf/10.1007%2Fs10654-016-0149-3.pdf
So sometimes, life gets in the way of class. As the poet Robert Burns wrote ‘’The best laid schemes o’ mice an’ men / Gang aft a-gley.’’
So there is a controversy simmering over the US Census and statistical methods. An editorial appeared on March 14 from the American Statistical Association on this. This is part of an ongoing debate about the role of the decennial census.
The relevant bits of the US Constitution https://www.census.gov/history/pdf/Article_1_Section_2.pdf
Relevant Law passed by Congress https://www.law.cornell.edu/uscode/text/13/141
And another relevant law https://www.law.cornell.edu/uscode/text/13/195
1999 The Supreme Court decision https://www.law.cornell.edu/supremecourt/text/525/326
What is an actual Enumeration?
The Supreme Court of the United States (SCOTUS) held that the Court concludes that the Census Act prohibits the proposed uses of statistical sampling in calculating the population for purposes of apportionment. In your opinion was this the right decision?
Justice Scalia writes in his concurrence that :
Justice Stevens [who wrote the dissent] reasons from the purpose of the census clause: “The census is intended to serve the constitutional goal of equal representation. . . . That goal is best served by the use of a ‘Manner’ that is most likely to be complete and accurate.” Post, at 8 (internal quotation marks and citation omitted). That is true enough, and would prove the point if either (1) every estimate is more accurate than a headcount, or (2) Congress could be relied upon to permit only those estimates that are more accurate than headcounts. It is metaphysically certain that the first proposition is false, and morally certain that the second is.
Some Census history http://time.com/5217151/census-questions-citizenship-controversy/
An editorial about the 1990 Census http://articles.latimes.com/1992-08-02/news/mn-5723_1_census-bureau
About the 2000 Census http://www.washingtonpost.com/wp-srv/local/daily/oct99/pmcensus012699.htm
About the 2010 Census http://content.time.com/time/nation/article/0,8599,1879667,00.html
The latest controversy for the 2020 Census http://www.pewresearch.org/fact-tank/2018/03/30/what-to-know-about-the-citizenship-question-the-census-bureau-is-planning-to-ask-in-2020/
The recent Opinion piece by the American Statistical Association on the 2020 Census http://myslu.stlawu.edu/~msch/WassersteinOpEdMarch2019
Clearly there have been several controversies about the United States diennial census. What trends if any do you see between the controversies in the 1990, 2000, 2010 and 2020 censuses?
What are the pro’s and con’s for having a question about ‘citizenship’ in the 2020 census?
What do you think of Wasserstein’s argument about statistical malpractice?
This is a really long one
To Sample or Not to Sample? The 2000 Census Controversy by Margo Anderson and Stephen E. Fienberg long https://www.jstor.org/stable/pdf/206984.pdf?refreqid=excelsior%3A76811bf5116c0f6b837fde36fbd#bb4b2
Based upon this article by Anderson and Fienberg, do you think that sampling should be used in the United States census of 2030? Justify your answer
Based upon all of these articles, do you think that sampling should be used in the United States census of 2030?
Give me two other questions from the readings that you would like the group to discuss
So we were going to either do a meeting on financial stuff next or talk about Bayesian methods. I’ll toss a virtual coin.
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.4.4
set.seed(31119)
a=c("Financial","Bayesian")
sample(a,1)
## [1] "Bayesian"
So Bayes it is. Historically, there have been two approaches to statistical analysis, a frequentist approach and a Bayesian approach. Most of what we cover in our courses at SLU broadly follows a frequentist approach.
A quick Summary http://faculty.washington.edu/kenrice/BayesIntroClassEpi2018.pdf
As a quick summary, what is the difference between frequentist methods and Bayesian methods?
How does Bayes’ Rule relate to Bayesian methods?
From The Theory that Would Not Die by Sharon Bertsch McGrayne (long), read the Preface, Chapter 3 and Chapter 15.
Why did Bayesian methods fall out of favor? What that an appropriate reason for them to fall out of favor?
On page 55, McGrayne writes the quote below. What is this quote about and why is it a criticism of frequentist methods?
Jeffreys thought it very strange that a frequentist considered possible outcomes that had not occurred. He wanted to know the probability of his hypothesis about the epicenter of a particular earthquake, given information about the arrival times of tsunamis caused by the earthquake. Why should possible outcomes that had not occurred make anyone reject a hypothesis?
On page 184 we have ‘Oh, we’ve just lost a hydrogen bomb’. I just wanted to type that. In broad terms how did Bayesian methods help the Navy search for lost items.
On the following page, McGrayne writes ‘Most academic statisticians would have thrown in the towel.’ Do you agree or disagree with that statement? Be sure to support your conclusions with observations from the reading and other observations you have had regarding applied statistical methods and academic statisticians you know.
So when I was in grad school there was a good deal of back and forth between Bayesians and Frequentists. Some of that is portrayed in McGrayne’s book but it has generally subsided and Bayes methods have been absorbed into the mainstream.
Efron’s Why Isn’t Everyone A Bayesian? https://www.jstor.org/stable/2683105
What makes the Bayesian approach so attractive to Efron? What does Efron see as the hurdles to being a Bayesian?
Why, in your opinion, do we not teach much Bayesian methods at St. Lawrence?
You don’t have to read the following but here are some of the responses to Efron’s paper. https://www.jstor.org/stable/2683106
https://www.jstor.org/stable/2683107
https://www.jstor.org/stable/2683108
https://www.jstor.org/stable/2683109
https://www.jstor.org/stable/2683110
https://www.jstor.org/stable/2683111
The code below is meant to illustrate some of that ideas about Bayesian methodology. Below we have prior information that is suggestive that the mean of the process should be 4 errors per day (from historical information) but we also have ten days worth of data on a new machine that suggests a rate of errors of 3.2 per day. The Bayesian approach gives a way to combine the information from both the old machine and the new maching albeit in a subjective way. Here I’ve chosen to model these data as coming from a Poisson process and to incorporate the prior data using a Gamma function.
Suppose we have a new machine at our factory to replace an old machine of the same type. The old machine made about 4 errors per day over the eight years of its lifetime. In the first ten days of the new machine it makes the following number of errors in order by day: 2, 4, 6, 1, 0, 4, 3, 2, 7, 3. This is an average of 3.2 errors per day for the new machine. If I ask you to predict the number of errors you expect the new machine to have in its lifetime, how do you answer?
lambda<-seq(1,7,by=0.01)
data<-c(2, 4, 6, 1, 0, 4, 3, 2, 7, 3)
numer<- exp(-10*lambda)*lambda^{sum(data)}
denom<-prod(factorial(data))
df<-data.frame(lambda,like=numer/denom)
ggplot(df,aes(x=lambda,y=like))+
geom_point()+
geom_vline(xintercept=3.2,col="blue")+
geom_vline(xintercept=4.0,col="red")
which.max(df$like)
## [1] 221
lambda[which.max(df$like)]
## [1] 3.2
lambda<-seq(1,7,by=0.01)
data<-c(2, 4, 6, 1, 0, 4, 3, 2, 7, 3)
numer<- exp(-10*lambda)*lambda^(sum(data))*dgamma(lambda,shape=8,scale=0.5)
denom<-prod(factorial(data))
df<-data.frame(lambda,like=numer/denom)
ggplot(df,aes(x=lambda,y=like))+
geom_point()+
geom_vline(xintercept=3.2,col="blue")+
geom_vline(xintercept=4.0,col="red")
which.max(df$like)
## [1] 226
lambda[which.max(df$like)]
## [1] 3.25
lambda<-seq(1,7,by =0.01)
data<-c(2, 4, 6, 1, 0, 4, 3, 2, 7, 3)
m=4; v=0.04
numer<- exp(-10*lambda)*lambda^(sum(data))*dgamma(lambda,shape=m^2/v,scale=v/m)
denom<-prod(factorial(data))
df<-data.frame(lambda,like=numer/denom)
ggplot(df,aes(x=lambda,y=like))+
geom_point()+
geom_vline(xintercept=3.2,col="blue")+
geom_vline(xintercept=4.0,col="red")
lambda[which.max(df$like)]
## [1] 3.92
Questions you have for me:
From Cathy O’Neill’s Weapons of Math Destruction, read the Introduction and Chapter 6.
From your reading of this reading, what is the dark side of ‘Big Data’?
On page 112, O’Neill mentions the feedback loop that Kronos engenders.From an economic market perspective, isn’t there an answer that Kronos will be inefficient and, therefore, will not help its clients enough and eventually go out of business? What do you think that O’Neill would make of that argument?
What makes a WMD in this context? What might make for a good model in this context?
From The Black Swan by Nassim Nicholas Taleb, read the Prologue and Chapter 6.
On the second page of Chapter Six, Taleb says that the ‘thoughtful Italian fellow traveler’ “had to invent a cause”. What cause was invented here?
Why do you think that evolution has made us so bad at identifying Black Swans?
Much of the first two readings are about bad modelling. What makes a model good? What makes a model bad?
from Daniel Kahneman’s Thinking Fast and Slow, read the Introduction and Chapter 21.
Much of the first two readings were about how bad models can be. And yet Kahneman suggests that algorithms tend to do better than humans. How should we reconcile these two perspectives? Is it possible in your opinion? Why or why not?
So Chapter 21 here is about humans decisions versus algorithms and Kahneman comes down on algorithms for the most part. In the next chapter, which you don’t have, Kahneman talks about when humans do better than algorithms. (It is a great read.) Under what conditions do you think that humans might do better than algorithms?
On page 229, Kahneman writes “Fortunately, the hostility to algorithms will probably soften as their role in everyday life continues to expand.” What would Cathy O’Neill, author of Weapons of Math Destruction make of that statement?