Julia: Hello World

Let’s begin with the biggest cliche in the programming world.

Installing Julia and IDE

The official Julia distribution can be downloaded from Julia’s website. There are multiple ways to develop with the Julia language. First of all, we can install the IJulia package to use Julia within Jupyter Notebook. To add a package, press “]” in side the Julia REPL and use the “add” command.

   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.3.0 (2019-11-26)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

(v1.3) pkg> add IJulia

Another way of writing Julia effectively is to use the Juno IDE, which is powered by Atom. The Atom editor can be downloaded from the official site, and the Juno IDE can be installed with Atom’s package manager.

Installing everything with snap on Ubuntu

On Ubuntu, the Atom editor can be installed with snap utility.

sudo snap install atom --classic

In theory, Julia can also be installed with snap as well.

sudo snap install julia --classic

However, when I wrote this post, there was a compatibility issue. The Julia version provided by snap seems to be too old for Juno to recognize. The Julia distribution downloaded from the official site worked fine with Juno.

Hello World

Let’s try a more unique hello world task. It can be observed that there are 8 distinct characters in the phrase “hello world”. They are “h”, “e”, “l”, “o”, “ “, “w”, “r” and “d”. We can use the corresponding ASCII values of each character to represent it. We assume that these are the target variable of a particular linear regression task. That is, we manually synthesize a dataset that can regress to these particular values. Then the phrase “hello world” can be generated by a set of data points.

First, we use the following Python program to generate the dataset and serialize it with HDF5, which is a cross-language format.

import numpy as np
import h5py


# create a random weight vector
w = np.random.uniform(1.0, 8.0, 10)
sign = np.random.randint(0, 2, w.size)
sign[sign == 0] = -1
w = w * sign

phrase = 'hello world'
phraseLetters = [ch for ch in phrase]
uniqueLetters = sorted(list(set(phrase)))
ints = [ord(ch) for ch in uniqueLetters]

# for each integer, generate some training samples
numSample = 50

def get_random_vector(targetNum):
    # generate a random vector
    randVec = np.random.uniform(-5.0, 5.0, w.size)
    # randomly select an index to replace
    indSel = np.random.randint(0, w.size)
    ip = randVec @ w - randVec[indSel] * w[indSel]
    diff = float(targetNum) - ip
    replaceVal = diff / w[indSel]
    finalVec = randVec.copy()
    finalVec[indSel] = replaceVal
    assert abs(finalVec.dot(w) - targetNum) < 1.0e-10
    return finalVec

trainVecs = dict()

for num in ints:
    randVecs = [get_random_vector(num) for _ in range(numSample)]
    randVecs = np.stack(randVecs, axis=0)
    randVecs += np.random.uniform(-0.5, 0.5, randVecs.shape)
    trainVecs[num] = randVecs

train_x = []
train_y = []

for key, val in trainVecs.items():
    y = np.zeros(val.shape[0], val.dtype)
    y += np.random.uniform(-0.2, 0.2, y.shape)

train_x = np.concatenate(train_x, axis=0)
train_y = np.concatenate(train_y)

test_x = []
# generate the prediction set
for ch in phrase:
    ind = ord(ch)
    vec = get_random_vector(ind)
test_x = np.stack(test_x, axis=0)

print('data generation finished')

file = h5py.File('hw_data.h5', 'w')
file['train_x'] = train_x
file['train_y'] = train_y
file['test_x'] = test_x

Then, in Julia, we read the data and run Ridge regression.

We get the “hello world” that we want!