Adventures in Data Science: More is Better | Department of Industrial and Systems Engineering

Dr. Jeff R. Knisley
Department of Mathematics and Statistics
East Tennessee State University
January 23rd, 2015, 2:30 – 3:30 PM
410 John D. Tickle Engineering Building

Dr. Jeff Knisley received his Ph.D. in mathematics from Vanderbilt University in 1990, where his focus was on Operator Theory and Applied Mathematics. He has been at East Tennessee State University (ETSU) since 1990, where his research focus has been in computational and mathematical neuroscience. The importance of neural networks in both computational neuroscience and in machine learning led naturally to applications in Big Data. Over the past decade, Dr. Knisley has published several papers in computational science, and has participated in several challenges including the DREAM challenges (Dialogues for Reverse Engineering Assessments and Methods) where he and his wife have collaborated to finish as high as 2nd. Dr. Knisley has also consulted for several large companies and organizations, and he has since 2009 taught a graduate level course on Analytics and Predictive Modeling at ETSU.

Talk Abstract: Is “Big Data” a fad? Or is it here to stay? Is data science a new field? Or is it just statistics from a new perspective? Will better experimental designs and collection techniques eliminate much of the “Big Data” phenomenon? Or is there something to data science and “Big Data” that we truly have not encountered before? The answers to these questions likely hinge on the sense in which “more is better” in a data science context. Certainly, more data is better than less data, but additionally, more variables tend to be better than fewer variables, more dimensions tend to be better than fewer dimensions, more complex tends to be better than less complex, and more computation tends to be better than less. That is, “Big Data” produces better results not only because there is more of it, but also because what traditional approaches tend to regard as the “noise” obfuscating a statistical description is often a computationally and mathematically complex encoding of important knowledge to be gained.

There is at present no comprehensive theory of why “more is better” in the sense described above, but there are many, many examples illustrating this phenomenon. This presentation illustrates the sense in which “more is better” via a series of applications (i.e., the “adventures”), each of which illustrates mathematically and otherwise why a large scale “data science” approach is to be preferred over individual approaches using statistical approaches, mathematical models, simulations, or similar. As will be seen in these examples, “better” or “preferred” is in comparison to the knowledge that is produced otherwise, even in contrast with powerful statistical techniques or deep mathematical models. Our final “adventure” will be an exploration of the concept of a consensus model as a possible indication of how data science might generalize statistical approaches and mathematical models into a new type of approach to science and technology.