WILEY SERIES IN PROBABILITY AND STATISTICS
Nonparametric
Hypothesis Testing
Rank and Permutation Methods
with Applications in R
Stefano Bonnini • Livio Corain
Marco Marozzi • Luigi Salmaso
Nonparametric Hypothesis Testing
WILEY SERIES IN PROBABILITY AND STATISTICS
Established by WALTER A. SHEWHART and SAMUEL S. WILKS
Editors: David J. Balding, Noel A.C. Cressie, Garrett M. Fitzmaurice, Geof H. Givens,
Harvey Goldstein, Geert Molenberghs, David W. Scott, Adrian F.M. Smith,
Ruey S. Tsay, Sanford Weisberg
Editors Emeriti: J. Stuart Hunter, Iain M. Johnstone, Joseph B. Kadane,
Jozef L. Teugels
A complete list of the titles in this series appears at the end of this volume.
Nonparametric Hypothesis Testing
Rank and Permutation Methods
with Applications in R
Stefano Bonnini
University of Ferrara, Italy
Livio Corain
University of Padova, Italy
Marco Marozzi
University of Calabria, Italy
Luigi Salmaso
University of Padova, Italy
This edition first published 2014
© 2014 John Wiley & Sons, Ltd
Registered office
John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, United
Kingdom
For details of our global editorial offices, for customer services and for information about how to apply
for permission to reuse the copyright material in this book please see our website at www.wiley.com.
The right of the author to be identified as the author of this work has been asserted in accordance with the
Copyright, Designs and Patents Act 1988.
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or
transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise,
except as permitted by the UK Copyright, Designs and Patents Act 1988, without the prior permission of
the publisher.
Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may
not be available in electronic books.
Designations used by companies to distinguish their products are often claimed as trademarks. All brand
names and product names used in this book are trade names, service marks, trademarks or registered
trademarks of their respective owners. The publisher is not associated with any product or vendor
mentioned in this book.
Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in
preparing this book, they make no representations or warranties with respect to the accuracy or
completeness of the contents of this book and specifically disclaim any implied warranties of
merchantability or fitness for a particular purpose. It is sold on the understanding that the publisher is not
engaged in rendering professional services and neither the publisher nor the author shall be liable for
damages arising herefrom. If professional advice or other expert assistance is required, the services of a
competent professional should be sought.
Library of Congress Cataloging-in-Publication Data
Nonparametric hypothesis testing : rank and permutation methods with applications in R / Stefano
Bonnini, Livio Corain, Marco Marozzi, Luigi Salmaso.
pages cm
Includes bibliographical references and index.
ISBN 978-1-119-95237-4 (cloth)
1. Nonparametric statistics. 2. Statistical hypothesis testing. 3. R (Computer program
language) I. Bonnini, Stefano. II. Corain, Livio. III. Marozzi, Marco. IV. Salmaso, Luigi.
QA278.8.N64 2014
519.5′
4–dc23
2014020574
A catalogue record for this book is available from the British Library.
Cover image: ‘A ship for discovery’ by Serio Salmaso, 1980, Venice
ISBN: 978-1-119-95237-4
Set in 10/12pt Times by Aptara Inc., New Delhi, India
1 2014
The greatest value of a picture is when it forces us to
notice what we never expected to see.
J. Tukey
Contents
Presentation of the book xi
Preface xiii
Notation and abbreviations xvii
1 One- and two-sample location problems, tests for symmetry and
tests on a single distribution 1
1.1 Introduction 1
1.2 Nonparametric tests 2
1.2.1 Rank tests 2
1.2.2 Permutation tests and combination based tests 3
1.3 Univariate one-sample tests 5
1.3.1 The Kolmogorov goodness-of-fit test 6
1.3.2 A univariate permutation test for symmetry 10
1.4 Multivariate one-sample tests 15
1.4.1 Multivariate rank test for central tendency 15
1.4.2 Multivariate permutation test for symmetry 18
1.5 Univariate two-sample tests 20
1.5.1 The Wilcoxon (Mann–Whitney) test 21
1.5.2 Permutation test on central tendency 27
1.6 Multivariate two-sample tests 29
1.6.1 Multivariate tests based on rank 29
1.6.2 Multivariate permutation test on central tendency 34
References 37
2 Comparing variability and distributions 38
2.1 Introduction 38
2.2 Comparing variability 39
2.2.1 The Ansari–Bradley test 40
2.2.2 The permutation Pan test 43
2.2.3 The permutation O’Brien test 46
2.3 Jointly comparing central tendency and variability 49
2.3.1 The Lepage test 50
2.3.2 The Cucconi test 52
viii CONTENTS
2.4 Comparing distributions 56
2.4.1 The Kolmogorov–Smirnov test 56
2.4.2 The Cramer–von Mises test ´ 59
References 61
3 Comparing more than two samples 65
3.1 Introduction 65
3.2 One-way ANOVA layout 66
3.2.1 The Kruskal–Wallis test 67
3.2.2 Permutation ANOVA in the presence of one factor 73
3.2.3 The Mack–Wolfe test for umbrella alternatives 76
3.2.4 Permutation test for umbrella alternatives 83
3.3 Two-way ANOVA layout 87
3.3.1 The Friedman rank test for unreplicated block design 87
3.3.2 Permutation test for related samples 89
3.3.3 The Page test for ordered alternatives 91
3.3.4 Permutation analysis of variance in the presence
of two factors 93
3.4 Pairwise multiple comparisons 95
3.4.1 Rank-based multiple comparisons for the
Kruskal–Wallis test 96
3.4.2 Permutation tests for multiple comparisons 98
3.5 Multivariate multisample tests 99
3.5.1 A multivariate multisample rank-based test 99
3.5.2 A multivariate multisample permutation test 103
References 105
4 Paired samples and repeated measures 107
4.1 Introduction 107
4.2 Two-sample problems with paired data 108
4.2.1 The Wilcoxon signed rank test 108
4.2.2 A permutation test for paired samples 114
4.3 Repeated measures tests 116
4.3.1 Friedman rank test for repeated measures 117
4.3.2 A permutation test for repeated measures 120
References 122
5 Tests for categorical data 124
5.1 Introduction 124
5.2 One-sample tests 125
5.2.1 Binomial test on one proportion 125
5.2.2 The McNemar test for paired data (or bivariate responses)
with binary variables 128
5.2.3 Multivariate extension of the McNemar test 131
CONTENTS ix
5.3 Two-sample tests on proportions or 2 × 2 contingency tables 134
5.3.1 The Fisher exact test 135
5.3.2 A permutation test for comparing two proportions 138
5.4 Tests for R × C contingency tables 139
5.4.1 The Anderson–Darling permutation test for R × C
contingency tables 140
5.4.2 Permutation test on moments 145
5.4.3 The chi-square permutation test 148
References 151
6 Testing for correlation and concordance 153
6.1 Introduction 153
6.2 Measuring correlation 154
6.3 Tests for independence 156
6.3.1 The Spearman test 157
6.3.2 The Kendall test 160
6.4 Tests for concordance 166
6.4.1 The Kendall–Babington Smith test 167
6.4.2 A permutation test for concordance 172
References 174
7 Tests for heterogeneity 176
7.1 Introduction 176
7.2 Statistical heterogeneity 177
7.3 Dominance in heterogeneity 178
7.3.1 Geographical heterogeneity 180
7.3.2 Market segmentation 184
7.4 Two-sided and multisample test 188
7.4.1 Customer satisfaction 189
7.4.2 Heterogeneity as a measure of uncertainty 191
7.4.3 Ethnic heterogeneity 194
7.4.4 Reliability analysis 196
References 197
Appendix A Selected critical values for the null distribution of the
peak-known Mack–Wolfe statistic 201
Appendix B Selected critical values for the null distribution of the
peak-unknown Mack–Wolfe statistic 203
Appendix C Selected upper-tail probabilities for the null distribution
of the Page L statistic 206
Appendix D R functions and codes 213
Index 219
Presentation of the book
The importance and usefulness of nonparametric methods for testing statistical
hypotheses has been growing in recent years mainly due to their flexibility, their
efficiency and their ease of application to several different types of problems, including
most important and frequently encountered multivariate cases. By also taking
account that with respect to parametric counterparts they are much less demanding
in terms of required assumptions, these peculiarities of nonparametric methods are
making them quite popular and widely used even by non-statisticians.
The growing availability of adequate hardware and software tools for their practical
application, and in particular of free access to software environments for statistical
computing like R, represents one more reason for the great success of these methods.
The recognized simplicity and good power behavior of rank and permutation
tests often make them preferable to the classical parametric procedures based on the
assumption of normality or other distribution laws. In particular, permutation tests
are generally asymptotically as powerful as their parametric counterparts in the conditions
for the latter. Moreover, when data exchangeability with respect to samples is
satisfied in the null hypothesis, permutation tests are always exact in the sense that
their null distributions are known for any given dataset of any sample size. On the
other hand, those of parametric counterparts are often known only asymptotically.
Thus for most sample sizes of practical interest, the related lack of efficiency of
unidimensional permutation solutions may sometimes be compensated by the lack of
approximation of parametric asymptotic competitors. For multivariate cases, especially
when the number of processed variables is large in comparison with sample
sizes, permutation solutions in most situations are more powerful than their parametric
counterparts.
For these reasons in the specialized literature a book dedicated to rank and
pe