Main Page
From Johnny Logic's Notebook
Welcome to my notebooks. The content of this site derives from my paper notebooks and various other documents I have collected over the years. The initial focus will be professional, but is bound to deviate.
My current focus is putatively termed data science. This discipline is subject to ongoing terminological dispute, but whether you call it data science, analytics, data shaping, or something else, it lives at the conjunction of hacking, mathematics and statistics, and domain knowledge.Here are my current projects:
Contents |
Theory, Math and Statistics
Linear Algebra
Linear algebra is a branch of mathematics that studies vector spaces, also called linear spaces, along with linear functions that input one vector and output another. Such functions are called linear maps (or linear transformations or linear operators) and can be represented by matrices if a basis is given. The matrix theory is often considered as a part of linear algebra. Linear algebra is commonly restricted to the case of finite dimensional vector spaces, while the peculiarities of the infinite dimensional case are traditionally covered in linear functional analysis.
Probability Theory
Probability theory is the branch of mathematics concerned with analysis of random phenomena. The central objects of probability theory are random variables, stochastic processes, and events: mathematical abstractions of non-deterministic events or measured quantities that may either be single occurrences or evolve over time in an apparently random fashion. If an individual coin toss or the roll of a die is considered to be a random event, then if repeated many times the sequence of random events will exhibit certain patterns, which can be studied and predicted. Two representative mathematical results describing such patterns are the law of large numbers and the central limit theorem.
Statistics
Mathematical statistics is the study of statistics from a mathematical standpoint, using probability theory as well as other branches of mathematics such as linear algebra and analysis.
Important concepts and results include:
Numerical Analysis
Numerical analysis is the study of algorithms that use numerical approximation (as opposed to general symbolic manipulations) for the problems of mathematical analysis (as distinguished from discrete mathematics).
Learning Theory
- Statistical learning theory
- Computational learning theory
- Formal learning theory
- Algorithmic information theory
Hacking, Computer Science and Software Engineering
Theoretical foundations of information and computation and practical techniques for their implementation and application in computer systems.
Theory of computation and formal languages
What can be (efficiently) automated? What can be computed and what amount of resources are required to perform those computations. Computability theory examines which computational problems are solvable on various theoretical models of computation. Computational complexity theory studies the time and space costs associated with different approaches to solving a multitude of computational problem.
Information and coding theory
The quantification of information, fundamental limits on signal processing operations such as compressing data and on reliably storing and communicating data. Coding theory is the study of the properties of codes and their fitness for a specific application. Codes are used for data compression, cryptography, error-correction and more recently also for network coding.
Algorithms and data structures
- Algorithms: an effective method expressed as a finite list of well-defined instructions for calculating a function. Algorithms are used for calculation, data processing, and automated reasoning.
- Data structures: data structure is a particular way of storing and organizing data in a computer so that it can be used efficiently.
Programming and Software Engineering
- Programming language theory is a branch of computer science that deals with the design, implementation, analysis, characterization, and classification of programming languages and their individual features.
- Software engineering (SE) is a profession dedicated to designing, implementing, and modifying software so that it is of higher quality, more affordable, maintainable, and faster to build. It is a "systematic approach to the analysis, design, assessment, implementation, test, maintenance and reengineering of software, that is, the application of engineering to software."
Databases and information retrieval
A database is intended to organize, store, and retrieve large amounts of data easily. Digital databases are managed using database management systems to store, create, maintain, and search data, through database models and query languages.
Machine Learning and Artificial Intelligence
- Machine learning: a branch of artificial intelligence, is a scientific discipline concerned with the design and development of algorithms that allow computers to evolve behaviors based on empirical data
Data Wrangling and Analysis Tools
Scripting languages, queering languages, DBMS, DB, Platforms, APIs, etc. Composites of these may comprise analysis stacks, such as the SharePoint 2010 BI Stack, or various combination of open source software in an Open Source Analysis Stack.
Data Collection
APIs Data Sources Scraping
Data Storage and Retrieval
- PL/SQL
- PostgreSQL
- SQL (ANSI Standard)
- TOAD
Analysis and Data Mining
- Matlab
- Octave
- Python
- NumPy
- SciPi
- R
- RapidMiner
- Weka
Graphing and Visualization
- Python
- Matplotlib
- WebFOCUS
Scripting, Programming and Regex
- Perl
- Python
Analysis, Data Mining and Machine Learning
Topics can be arranged in many ways:
By CRISP-DM Process Stages:
Business understanding
Data understanding
Data preparation
Modeling
Evaluation
Deployment
Meta
Infoboxes
I have got Infoboxes working-- now to get a workable template for Template:Infobox
Graphing
Enabled Google Chart API, via allow HTML. Need to think about security implications (see HTML Purifier).
Math Add-On
Helpful Links
- User's Guide: Consult for information on using the wiki software.
- Configuration settings list
- MediaWiki FAQ
- MediaWiki release mailing list
- Markup on MediaWiki