Bookmarks for May 27th through July 28th

These are my links for May 27th through July 28th:

  • Statistical Modeling: The Two Cultures – There are two cultures in the use of statistical modeling to reach conclusions from data. One assumes that the data are generated by a given stochastic data model. The other uses algorithmic models and treats the data mechanism as unknown. The statistical community has been committed to the almost exclusive use of data models. This commitment<br />
    has led to irrelevant theory, questionable conclusions, and has kept statisticians from working on a large range of interesting current problems. Algorithmic modeling, both in theory and practice, has developed rapidly in fields outside statistics. It can be used both on large complex data sets and as a more accurate and informative alternative to data modeling on smaller data sets. If our goal as a field is to use data to solve problems, then we need to move away from exclusive dependence on data models and adopt a more diverse set of tools.
  • Brief history of data visualization – Data visualization is a pretty literal term that means, quite simply, the visual representation of quantitative data. In this course we’ll learn common techniques for visualizing data, as well as some strategies for managing information digitally. But first, a brief history.
  • S. Thompson. Motif-index of folk-literature – a classification of narrative elements in folktales, ballads, myths, fables, mediaeval romances, exempla, fabliaux, jest-books, and local legends.
  • What is data science? – O’Reilly Radar – We’ve all heard it: according to Hal Varian, statistics is the next sexy job. Five years ago, in What is Web 2.0, Tim O’Reilly said that “data is the next Intel Inside.” But what does that statement mean? Why do we suddenly care about statistics and about data?<br />
    <br />
    In this post, I examine the many sides of data science — the technologies, the companies and the unique skill sets.
  • [1005.0437] A Unifying View of Multiple Kernel Learning – Recent research on multiple kernel learning has lead to a number of approaches for combining kernels in regularized risk minimization. The proposed approaches include different formulations of objectives and varying regularization strategies. In this paper we present a unifying general optimization criterion for multiple kernel learning and show how existing formulations are subsumed as special cases. We also derive the criterion’s dual representation, which is suitable for general smooth optimization algorithms. Finally, we evaluate multiple kernel learning in this framework analytically using a Rademacher complexity bound on the generalization error and empirically in a set of experiments.

Bookmarks for September 14th through September 22nd

These are my links for September 14th through September 22nd:

  • Philosophy Now | Daniel Dennett: Autobiography (Part 1) – What makes a philosopher? In the first of a two-part mini-epic, Daniel C. Dennett contemplates a life of the mind – his own. Part 1: The pre-professional years.
  • Philosopher’s Annual – Our goal is to select the ten best articles published in philosophy each year—an attempt as simple to state as it is admittedly impossible to fulfill. Against a background of twenty-four volumes in hard copy, the Annual is now available entirely online.
  • Revolutions: Interactive stock visualizations with R – Jeroen Ooms, who recently completed his Masters in Statistics at Utrech University, has created an outstanding web-based drag-and-drop application for visualizing financial data. With his “StockPlot” t application, you can select any stock from a number of world exchanges (including NASDAQ, DAX, FTSE), and drag it to a worksheet to see a time-series of the stock price. You can arrange up to four charts on the same worksheet for comparison purposes, and control the timeframe and appearance of each chart.
  • Revolutions: Machine Learning in R, in a nutshell – Josh Reich has created a concise R script demonstrating various machine-learning techniques in R with simple, self-contained examples.
  • Information Processing and Thermodynamic Entropy (Stanford Encyclopedia of Philosophy) – Are principles of information processing necessary to demonstrate the consistency of statistical mechanics? Does the physical implementation of a computational operation have a fundamental thermodynamic cost, purely by virtue of its logical properties? These two questions lie at the centre of a large body of literature concerned with the Szilard engine (a variant of the Maxwell’s demon thought experiment), Landauer’s principle (supposed to embody the fundamental principle of the thermodynamics of computation) and possible connections between the two. A variety of attempts to answer these questions have illustrated many open questions in the foundations of statistical mechanics.
  • Christopher J. G. Meacham, Two Mistakes Regarding The Principal Principle | PhilPapers – This paper examines two mistakes regarding David Lewis’ Principal Principle that have appeared in the recent literature. These particular mistakes are worth looking at for several reasons: the thoughts that lead to these mistakes are natural ones, the principles that result from these mistakes are untenable, and these mistakes have led to significant misconceptions regarding the role of admissibility and time. After correcting these mistakes, the paper discusses the correct roles of time and admissibility. With these results in hand, the paper concludes by showing that one way of formulating the chance-credence relation has a distinct advantage over its rivals.
  • José Luis Bermúdez – Decision Theory and Rationality – Reviewed by Lara Buchak, UC Berkeley – Philosophical Reviews – University of Notre Dame – Decision theory is used for a variety of purposes: decision makers use it to guide their own actions, and theorists use it both normatively to assess decision makers and to predict and explain their decisions. This book investigates whether the theory can fulfill all three of these purposes. In particular, Bermúdez explores three questions that decision theory must answer under any guise: How should we understand utility and preference? How finely should we individuate the possible outcomes in a decision problem? And how should choice be constrained over time? He argues that there are no answers to these questions that allow decision theory to serve all three purposes.

Bookmarks for September 8th through September 14th

These are my links for September 8th through September 14th:

  • Jaakko Hintikka, Past, present and future of set theory | PhilPapers – What one can say about the past, present and future of set theory depends on what one expects or at least hopes set theory will accomplish…I begin with a quote from the inaugural lecture in 1903 of my mathematical grandfather, the internationally known Finnish mathematician Ernst Lindelöf. The subject of his lecture was – guess what – Cantor’s set theory. In his conclusion, Lindelöf says of Cantor’s results: For mathematics they have lent new tools and opened up new fields of research, they have thrown entirely new light on the foundations of analysis and brought clarity and order where there was only disorder and contradictions. Thus they have greatly contributed to the harmony that is the essence of mathematics, a harmony a grasp of which is the reward of mathematical research. We can all agree with the compliments Lindelöf pays to set theory as an impressive specimen of mathematical research, including the theory of infinite cardinals and ordinals.
  • An Introduction to Data Mining – Data mining, the extraction of hidden predictive information from large databases, is a powerful new technology with great potential to help companies focus on the most important information in their data warehouses. Data mining tools predict future trends and behaviors, allowing businesses to make proactive, knowledge-driven decisions. The automated, prospective analyses offered by data mining move beyond the analyses of past events provided by retrospective tools typical of decision support systems. Data mining tools can answer business questions that traditionally were too time consuming to resolve. They scour databases for hidden patterns, finding predictive information that experts may miss because it lies outside their expectations.
  • Open Source BI: A Market Overview, Steve Holub – “The difficulty lies not so much in developing new ideas as in escaping from old ones.”John Maynard Keynes

    The following survey provides a list of open source software (OSS) tools used in business intelligence (BI) and data warehousing systems. The tool selection criteria was based on the frequency and currency of the releases and on whether the product has released a stable build which could be used in a production environment. We only present those solutions which have had updates within the past two years. Our study looked at BI tools in the following categories: i) databases; ii) extract/transform/load (ETL); iii) master data management; iv) BI reporting tools; and v) data mining. In the case of an open source software bundle that overlaps categories, we divide the software bundle into its separate parts for ease of categorization.

Bookmarks for July 8th through July 29th

These are my links for July 8th through July 29th:

  • Should Copyright Of Academic Works Be Abolished? – The conventional rationale for copyright of written works, that copyright is needed to foster their creation, is seemingly of limited applicability to the academic domain. For in
    a world without copyright of academic writing, academics would still benefit from publishing in the major way that they do now, namely, from gaining scholarly esteem.
    Yet publishers would presumably have to impose fees on authors, because publishers
    would not be able to profit from reader charges. If these publication fees would be borne
    by academics, their incentives to publish would be reduced. But if the publication fees
    would usually be paid by universities or grantors, the motive of academics to publish
    would be unlikely to decrease (and could actually increase) – suggesting that ending
    academic copyright would be socially desirable in view of the broad benefits of a
    copyright-free world…
  • BBC – Radio 4 In Our Time – Philosophy Archive – You can listen again to all the programmes online. The most recent programmes appear at the top of the page.
  • [0907.1579] The Computational Power of Minkowski Spacetime – The Lorentzian length of a timelike curve connecting both endpoints of a classical computation is a function of the path taken through Minkowski spacetime. The associated runtime difference is due to time-dilation: the phenomenon whereby an observer finds that another's physically identical ideal clock has ticked at a different rate than their own clock. Using ideas appearing in the framework of computational complexity theory, time-dilation is quantified as an algorithmic resource by relating relativistic energy to an $n$th order polynomial time reduction at the completion of an observer's journey. These results enable a comparison between the optimal quadratic \emph{Grover speedup} from quantum computing and an $n=2$ speedup using classical computers and relativistic effects. The goal is not to propose a practical model of computation, but to probe the ultimate limits physics places on computation.
  • How to choose a statistical test – This book has discussed many different statistical tests. To select the right test, ask yourself two questions: What kind of data have you collected? What is your goal? Then refer to Table 37.1.
  • NPWRC :: Statistical Significance Testing – Four basic steps constitute statistical hypothesis testing. First, one develops a null hypothesis about some phenomenon or parameter. This null hypothesis is generally the opposite of the research hypothesis, which is what the investigator truly believes and wants to demonstrate. Research hypotheses may be generated either inductively, from a study of observations already made, or deductively, deriving from theory. Next, data are collected that bear on the issue, typically by an experiment or by sampling. (Null hypotheses often are developed after the data are in hand and have been rummaged through, but that's another topic.)
  • Data Mining Techniques – Data Mining is an analytic process designed to explore data (usually large amounts of data – typically business or market related) in search of consistent patterns and/or systematic relationships between variables, and then to validate the findings by applying the detected patterns to new subsets of data. The ultimate goal of data mining is prediction – and predictive data mining is the most common type of data mining and one that has the most direct business applications.

Bookmarks for July 6th through July 8th

These are my links for July 6th through July 8th:

  • How to choose a statistical test – This book has discussed many different statistical tests. To select the right test, ask yourself two questions: What kind of data have you collected? What is your goal? Then refer to Table 37.1.
  • NPWRC :: Statistical Significance Testing – Four basic steps constitute statistical hypothesis testing. First, one develops a null hypothesis about some phenomenon or parameter. This null hypothesis is generally the opposite of the research hypothesis, which is what the investigator truly believes and wants to demonstrate. Research hypotheses may be generated either inductively, from a study of observations already made, or deductively, deriving from theory. Next, data are collected that bear on the issue, typically by an experiment or by sampling. (Null hypotheses often are developed after the data are in hand and have been rummaged through, but that’s another topic.)
  • Data Mining Techniques – Data Mining is an analytic process designed to explore data (usually large amounts of data – typically business or market related) in search of consistent patterns and/or systematic relationships between variables, and then to validate the findings by applying the detected patterns to new subsets of data. The ultimate goal of data mining is prediction – and predictive data mining is the most common type of data mining and one that has the most direct business applications.
  • An Overview of Data Mining Techniques – This overview provides a description of some of the most common data mining algorithms in use today. We have broken the discussion into two sections, each with a specific theme:* Classical Techniques: Statistics, Neighborhoods and Clustering
    * Next Generation Techniques: Trees, Networks and Rules

    Each section will describe a number of data mining algorithms at a high level, focusing on the “big picture” so that the reader will be able to understand how each algorithm fits into the landscape of data mining techniques. Overall, six broad classes of data mining algorithms are covered. Although there are a number of other algorithms and many variations of the techniques described, one of the algorithms from this group of six is almost always used in real world deployments of data mining systems.

  • MachineLearning.pdf (application/pdf Object) – Over the past 50 years the study of Machine Learning has grown from the efforts of a handful of computer engineers exploring whether computers could learn to play games, and a field of Statistics that largely ignored computational considerations, to a broad discipline that has produced fundamental statistical-computational theories of learning processes, has designed learning algorithms that are routinely used in commercial systems
    for speech recognition, computer vision, and a variety of other tasks, and has spun off an industry in data mining to discover hidden regularities in the growing volumes of online data. This document provides a brief and personal view of the discipline that has emerged as Machine Learning, the fundamental questions it addresses, its relationship to other sciences and society, and where it might be headed