- I built a library of of koans using the Julia programming language as part of a course project last semester! You can Run them on Colab here
- The koans themselves are hosted on Jupyter Notebooks, and built from Julia source code with Literate.jl
- Flux.jl is deep learning library I used
- The koans are run locally, the project Github repo is here
- Although I’ve finished my first run of chapters, I’m still in the process of exploring these concepts, and would appreciate feedback!

First, a **koan** is a programming language problem with three aspects: - A text section that contains the introduction of a concept - A short snippet of no working code - A “test” or proof that will work once the understanding of the above concept has been used to fix the code

And **deep learning** is well, a buzzword, but I took it hear to mean a library, or API, that is capable of building modular neural networks using GPU acceleration.

Literate programming is the idea that your source code is both human readable, and machine runnable. Literate.jl takes this up a step, and let’s the user programatically build Jupyter notebooks from executable scripts. For instance,

```
# # This would be a text
# right here is a continuation of that text
x_str = "now we are in a source code cell"
y = 1
# The first comment brings us back to text!
```

With that set up, building koans is pretty simple as we can alternate explanations with koans, and have the user interactively test and change their code.

An example koan would be:

```
# # A Demo Koan
# array indexing in Julia is 1-based
xarray = ["a", "b", "c", "solution"]
ind = 0 # Fix me !
ind = 4 #src
@assert xarray[ind] == "solution"
```

And we can see a screen shot of the notebook.

Indeed, the `#src`

tag will be filtered out by Literate.js when compiling the notebook, allowing the koan writer to test all the koans by sourcing the script, and the user to look up the solution in the source code if necessary. My script for generating the notebooks from Julia is available here, and you can find more information in The Literate Julia Docs

Julia is a great programming language, and probably the best option for building neural networks for two reasons: - Julia runs fast, is gradually typed so you can write it fast, and compiles down to LLVM which means no calls to C/C++! - For the above reasons, a neural network can be programmed entirely with Julia, making differential programming much simpler, and building neural network easier!

Flux is still under active development, with notable improvements happening in the area of differential programming over the last year, and the next generation differentiation system, Zygote is being integrated into Flux now!

So how do we teach Flux? My approach was to write 7 chapters, first covering Julia, then covering what I believed to be the the most important aspects of using a new DL library: working with data, building models, training models, using the GPU.

My strategy was inspired by a project I did earlier this fall to implement a variational auto-encoder in Flux, and I tried to create the document I would have wanted to read given my knowledge base (know math, ML, R/Python), if I were to implement a similar project. If I get a chance, I’ll talk about that project in another post!

Therefore, I set up 7 chapters, as collections of koans, in the following way:

1. Introduction to Julia

2. Working With Data In Julia

3. Intro to Flux

4. Convolutional Neural Networks and Layers in Flux

5. Recurrent Neural Networks and Layers in Flux

6. Flux optimization

7. Putting in all together, and more examples!

For the content of the koans, I wrote many of them myself, and was heavily inspired by the tutorial examples in the Flux source code.

Whether through the use of “koans”, the chrome inspect tool, or the command line, if you are going to learn a new library, you need to play with it. Although I am not sure if koans via Jupyter notebook are here to stay, I think there is an acute need for easy ways to play around with code when you are trying to learn something new. Adding insight to this process should be the goal of any good koan writer!

]]>Everyday this year, between 1 and 4am an email from arxiv.org appeared in my inbox, with the title and abstract of every paper submitted the previous day in Artificial Intelligence, Computation and Language, Computers and Society, Human-Computer Interaction, Information Retrieval, Learning, Other Computer Science, Programming Languages, Software Engineering, Social and Information Networks.

It’s a lot. Today, 156 submissions. I don’t read all the abstracts, just the title, then read into the abstract until I have a reason not to. Maybe every other day I finish a paper, but those are the topics needed to cover my interests in data science, software engineering, and start up technologies. For me, its a question of breadth and depth.

Curiosity. But that’s not a good enough answer! The first major reason is that I enjoy reading and scrutinizing papers, of all levels, and to be honest, Arxiv.org is a bit of a mixed bag. The next, is that I like searching for ideas that relate to what I’m doing at work, and inspire me to develop the skills I need to do a ‘great’ side project currently beyond my skills. There’s been a lot of positive feedback while reading, and I find new information all the time that are tangentially related to work.

Over the course of the year I found a variety of interesting papers that have influenced my work, thinking, and that I otherwise just think are worth sharing. Here they are:

A Conceptual Introduction to Hamiltonian Monte Carlo STAN, a probabilistic programming language used for bayesian statistics, uses Hamiltonian Monte Carlo, and this is the guide for understanding the algorithm with a differential geometry primer included. Betancourt provides a good overview, covering the concepts, important metrics used for debugging STAN, and even the mathematics behind Hamiltonian Systems and phase space. This is a fascinating paper from the perspective of applying physics to solve numerical problems alone, but what makes it great is the geometric intuition it provides when you need to get a STAN model to converge. In my experience, the intersection between statistical model, STAN implementation, and convergence provides the solution space for possible bayesian models in STAN, and this guide really helps understand the later two concepts.

The Future of Ad Blocking: An Analytical Framework and New Techniques Ads are everywhere, and we are only becoming more conscience of the effect they have on our attention, web experience, and importantly, privacy. Blocking ads is popular, although existing solutions are technically rudimentary in their implementation. The authors discuss how the ad blocking may likely evolve, what technical game states will be encountered, and propose an interesting end game that consists of user software that can both actively block ads while obfuscating ad block detection. There is incredible demand for ad-blocking software, and this paper really spells out and interesting solution to a problem many of us face!

Developing Bug-Free Machine Learning Systems With Formal Mathematics

The authors here are trying to bridge the gap between building a machine learning system, and deploying that system in production. If you are unfamiliar with how difficult this is, its a problem of design opposites: in model develop you are looking for a solution and you aren’t sure what exactly you’ll end up with, and in production you need a fast, efficient, and ultimately reliable algorithm that will work safely every time. What’s so fascinating, is that they built a programming language with theorem proving expressive enough to run a variety of models needed for exploring model space, which is inherently “safe” enough to run in production. This idea is certainly far from finished, but its an early example of how programming language theory can help solve some of the more difficult problems in industrial data science and machine learning. This idea is far from finished, and 2018 will hopefully see more work being done on this, with possible integrations with the major deep learning libraries.

Sparsity information and regularization in the horseshoe and other shrinkage priors Regularization is important in machine learning, and lets us train models that efficiently use just the features needed for prediction. Known as sparsity for bayesian statistics, this problem is difficult in terms of both defining the proper theoretical distributions, and computationally estimating the distributions parameters with sampling techniques. This paper goes a long way by providing theoretically justified, and computationally converging priors that allow for sparsity constraints to easily be added to bayesian models in STAN. This is a huge breakthrough, and adds a significant new practical technique available for bayesian modeling in STAN. Where only strongly uninformative priors were available before, we now have the ability to select n out of k features to be used in the final bayesian model. This should make STAN a viable option for feature selection, potentially expanding its role in many data science projects.

Proxy Discrimination in Data-Driven Systems What is fairness? Is our process fair, are we? For regulated institutions and implementers with moral fiber, this is a vital questions. This paper defines the use of proxy variables in discriminatory machine learning systems using information theory, then develops pseudocode algorithms for testing the presence of proxy variable discrimination in automated decisions. I like this paper for two reasons: One, is that it provides a good, testable mathematical basis for a concept many of us building algorithms are familiar with, proxy discrimination, and two, reading this paper was my first exposure to the field of fairness, and all its corresponding and contradictory measures. Issues of fairness are only becoming more of a priority for stakeholders, and this paper gives you a jumping off point to determine the fairness of an existing process and its inputs.

Disintegration and Bayesian Inversion, Both Abstractly and Concretely

A wonderful paper about manipulating probability distributions, including beautiful visualizations of probabilisitic manipulation. The paper solidifies the notion of a probability distribution in a formal language, the basis for EfProb Library in Python. Overall, this is a really interesting application of formal semantics, mathematics, and statistics that I found extremely educational and a joy to read! For the mathematics alone, this paper is definitely worth a browse, especially if you are building a a software system with probabilistic reasoning!

Stream Graphs and Link Streams for the Modeling of Interactions over Time This paper develops a formalism for dealing with graph interactions over time, which is both self-consistent, and compatible with graph theory. This provides a coherent framework, and subsequently develops a set of graph theory measures, for dealing with datasets in many operational domains. What caught my attention was how well the formalism describes the some of the operational and network data I see at work, and the elegance of the solution. It would be interesting to see some additional work done here with causality, as the formalism already describes temporal relationships so well.

A Tutorial on Canonical Correlation Methods Canonical Correlation Analysis is a multivariate statistical technique to compare paired sets of variables, where each set can contain many measures. This is a very well written tutorial, from explaining the motivation and history of the technique, to formulating CCA and giving a proof of its solution via Lagrange Multipliers. For a reader familiar with Principal Components Analysis, or even just linear algebra, this is a surprisingly effective tutorial, and very much worth the time!

The biggest trend I saw this year was “Deep Learning applied to problem X”,(What really is Deep Learning?) there are numerous papers per day that just deal with neural networks, implemented in all of the major toolkits. There is a lot of noise here, but definitely some good work, and I’m especially looking forward to what comes out next year about the role of causality and information theory in neural network representations. See top answer for list of Deep Learning pubs in 2017.

The next big thing I saw was exploratory data analysis on Twitter: select some tweets, run a suite of NLP feature extraction tools, then perform a statistical analysis. There are a lot of similar papers trying to predict ‘Fake News’: using paired data sets, using crowdsourcing, a lot of approaches. Something I wasn’t expecting to see, was a lot of survey paper asking software developers opinion and work related questions, like “What are your favorite tools?”, “Why did you quit your last job?”, etc. These are usually on the lower end of the rigor spectrum, and are often hit or miss in methodology. Nonetheless, there are often some interesting insights, however obvious the subject matter may be.

Another interesting area was the application of formal methods to existing problems: two of these papers made the list, and I find myself constantly thinking about, developing, and refining notation while working on data science problem.

- What my areas of interest are, including programming languages, time series processing models, bayesian statistics, causality and information in deep learning, and formal methods applied to X.
- The most effective process for me is to read the daily digest of abstracts is: open the email, read the title and author, and continue through the abstract, onto the link. If interest is lost at any point, go to the next entry, else, bookmark the link. Next, I subsequently read all the bookmarks.
- Whatever your problem is, there is someone working on a similar problem, whose approach will benefit you. This held for the majority of the data science problems I encountered, even if all they had to offer was a perspective on how to optimize something I didn’t have time to do!

- Its important to quickly distill the essence of an idea, how the authors are approaching a problem, and what they hope to achieve. There are a ton of great ideas on Arvix.org, and you can quickly see a new perspective on something!

Just tried out Python Pandas for analysis work as an alternative to R, so far, so good! Really impressed with the interface, speed, and resources online. For me, trying Pandas is analogous to using R’s data table package: some slight differences, but all the same functionality when it comes to data transformation. Pandas is built on top of NumPy arrays, and between those two packages all of R’s data frame capability is available in the python environment.

My experience using R and its wealth of well developed statistic, machine learning, and visualization packages gives me quick access tools not found in Python, and the community of Statisticians using R ensures that cutting edge packages are published regularly. But let’s be honest: R has some major weirdness that makes it hard to learn, difficult to run concurrently, and downright slow for some data structures access patterns. There are ways around a lot of these issues, but it’s hard to overcome the fact that few people know R, and fewer know it well compared to python. Writing critical code for a start-up in R a risky proposition when it comes to maintainability. Most likely, a start up is using more than just R(prove me wrong!), and if Python or alternative language can handle the analysis task, it should be used. This is where Pandas can really shine, for data transformations.

Getting used to Pandas has put my foot in the door of doing data science in the Python ecosystem, although transferring all my skills in R will require learning about a dozen packages. With all the folks out there using Python for projects and companies, the advantage of using Python for analysis can only grow as Python matures. If Python can woo Academia’s statisticians, R will eventually lose it’s superiority package support, and the user environment where a lot of folks like me learned it. Until then, I’ll most likely use both languages for different tasks while I eager anticipate the day until Julia becomes better than both!

Check out this cheat sheet for Pandas basics

A nice translation of R’s data frame functions to Pandas

Comparison of R and Python for data science(Post image is from them): link

]]>