Thoughts and Data

by Alexander Svanevik

Map Your Way to Happiness

| Comments

Here’s an idea. Every single activity in life can be scored along two axes:

  1. How it makes you feel in the moment
  2. How it will impact your life in the future

Let me borrow some Greek words, and abuse some philosophical terms, by calling these two axes praxis and poiesis.

I recently scored activities in my own life according to this simple system.

This is the result:

OK, I don’t do heroin. And I don’t have any parents-in-law. But this is my perceived, subjective scoring of these activities. It’s my Happy Map.

Now, let us journey through the five distinct lands of the Happy Map.

We start in the South-West (or lower left). Here we find things that are bad for me in the long run that also make me feel bad when I do them. Things like worrying, complaining, and smoking. This is not a place I want to be, so let me call this place Mordor.

Next, we move towards North-East, away from pain and misery. We arrive in the centre of the Happy Map, in a place where I am … nothing. Not happy, not sad. Not developing, but not harming myself (significantly) either. I’m on Facebook. I’m procrastinating. I’m watching TV. I’m numb. This is Wasteland. This place has a strange gravitational pull, even if it’s void of any mass or content.

It’s hard to leave Wasteland, but eventually I feel like I have to do something with my life, so I travel North-West. Accomplishing things surely means sacrifice. No pain, no gain. I do interval training to get in shape. I work long hours at the office to build my career. I am investing in myself in Ambitious Masochist Land. Only the guys at Wall Street are North-West of me.

After putting in 14 hours a day at work during the week, I’ve had enough. It’s Friday, and I’m heading down to Sin City. One beer is not enough, I need ten. If it weren’t for my last drop of reason, I’d probably be at the South-Eastern edge of the world shooting heroin.

The next day is awful. I start thinking about my life. What makes me happy? I mean truly, profoundly happy – both in the moment and in the long term. Creating things. Reading. Learning about the world. Writing. Spending time with the people you love. Having good conversations with interesting people. Traveling. Laughing.

These are the things that I find in the State of Zen. That’s where I want to spend my time.

Create your own Happy Map here and tell me what’s in your State of Zen in the comment field below.

How I Learned German in 30 Days

| Comments

This post is my summary of an experiment I started in January 2015: Learn German in 30 days.

Before starting, I feel obliged to list a few disclaimers:

  • “I learned German” means I can have basic conversations in German, as well as understand German fairly well (both written and spoken). I have definitely not learned how to speak German perfectly.
  • I do not claim originality of everything in this text; I’ve been heavily inspired by the writings of other language learners such as Barry Farber and Benny Lewis.
  • My native language is Norwegian. This gave me somewhat of an advantage as it is linguistically closely related to German.

If you want to learn a language as quickly as possible, you should keep reading. In addition to giving some general tips, and hopefully inspiration, I’ve created some resources that I’ll share in this text which will help you learn German fast.

I spent about 30-60 minutes daily in my 30-day period, so the time investment wasn’t huge.

Now, let’s get started.

Five principles of effective language-learning

I had an hypothesis that there are five important principles for effectively learning a language. One of the purposes of my experiment was to test if these were any good. The five principles are:

  1. Set a clear goal
  2. Speak from day 1
  3. Focus on frequent words
  4. Immerse yourself
  5. Keep track

Below I’ll explain exactly how I used these principles to learn German.

1. Set a clear goal

I followed the Objectives and Key Results (OKR) approach for setting a goal. My overall objective was to learn as much German as possible in 30 days. More specifically I wanted to accomplish these key results:

  1. Learn the 1000 most frequent German words
  2. Learn 10 German songs by heart
  3. Be able to have basic conversations with my German friends

1 and 2 are nice because they’re measurable, but the most important key result to me was number 3, which was a bit too vague. In order to make it more tangible, I booked a ticket to Berlin, and decided I should be able to spend the whole weekend with an old (German) friend, speaking only German.

Also, once I had decided that I would actually go through with my plan, I basically announced to the whole world that I was going to learn German in 30 days. The purpose of this was purely psychological, as I then would have to stick to the plan in order to not look like a complete idiot. In fact, throughout my 30-day period, people kept asking me “how’s the German study going?”. To which I could always reply “Sehr gut, danke!”

(I also secretly decided I would record a video in Berlin speaking in German, which I did at day 29, but I’ll spare you the awkwardness of the video I eventually posted on Facebook.)

2. Speak from day 1

I believe one of the biggest mistakes you can make when learning a language is to postpone speaking until “you’re ready”. Language learning is like going to the gym: if you want to build muscles, anything other than exercising is just procrastinating.

Specifically what I did to speak from day 1 was to find some friends who either spoke German, or wanted to learn German. I then told them that I would be online for 30 minutes on appear.in every day at 8pm and gave them my own custom URL. I told them I would love to practice German with them there. I managed to get 5 people to join in total. None of them were native German speakers, though some were already fluent (which I found to be critically important!), and some were just starting out like me.

In order to be able to stay in our target language (… German), and not revert back to English or Norwegian, I created a Cheat Sheet containing essential phrases. This sheet prevented me from getting blocked and generally proved incredibly useful in our online conversations.

Following the “language is a muscle” analogy, I also repeated out loud anything I heard or read in German when studying on my own. Compared to only passively “receiving” (listening, reading) German, I think this really makes a difference for strengthening those German synapses in your brain.

3. Focus on frequent words

If you’re not familiar with Zipf’s law, you might be surprised to hear that only the 100 most frequent words account for about 50% of all spoken words in German films. Take a moment to reflect on this astonishing fact. This basically means that every other word you hear in a German film is a word from that top-100 list. The obvious conclusion? You need to know those words!

Illustration of Zipf’s laws borrowed from this presentation.

I found a list of the most frequent words in German subtitles here, and created my own GoogleDocs spreadsheet with the 1000 most frequent words. These words account for ~75% of all words in German subtitles. My simple task then was to fill out the “meaning” column for every word before my 30 days had passed. In other words, I had to learn about 30 words per day. I made heavy use of cognates from English and Norwegian in order to learn them, and I exported the words I had learned to Anki about once a week so I could make sure I didn’t forget them.

The frequency list is one of the things I did that worked best. It served as a nice anchor that I could centre my learning around.

To test my vocabulary in the real world, I sometimes tried reading German newspapers or books, highlighting words I didn’t understand. After reading a page or paragraph, I would count the words I knew vs words I didn’t knew, then compute the ratio words known / total words in text. Towards the end of my study period, I would clock in at around 80-85% – cognates or context helping me push above the 75% I got from the top-1000 list.

4. Immerse yourself

Changed my Facebook language to German. Watched lots of Yabla videos.

But the thing that worked best for me was my Spotify playlist with 10 German songs.

Once I had learned the lyrics of these songs, I could play them whenever and be exposed to German anytime I could listen to music. I even recorded myself singing and playing these songs on the guitar. Again I’ll spare you the embarrassment.

Picking the lyrics up from just listening to the songs is hard, so I would study the lyrics separately before trying to memorize anything. I used Lingq, which meant I had the lyrics available on my iPhone, and I could easily track which words were new when studying a new song.

5. Keep track

My Top 1000 Words spreadsheet was great for knowing roughly how many words I knew at any given point. Since I had the frequency of each word, I could compute the total “mass” of German that I knew – not just the total number of words – which I found to be quite motivating.

I also kept a simple diary in Evernote listing my activities per day. In the end there was only one day when I didn’t do anything at all.

Other

I followed a course called RocketLanguages, but I only completed the first 3 modules.

On day 27, I started Duolingo and managed to test in at level 10. Not bad!

I almost didn’t do any grammar study, which I think was a good decision since I only had 30 days, and had no ambitions of speaking perfectly. One thing I would have liked to do in order to learn the German cases, though, is a set of 4 sentences of the type “The man gave the book to the boy” – one sentence for each gender + plural. Memorizing these four sentences would then probably have been a lot easier than trying to remember a case table.

Conclusion

Having concluded the experiment I would say I reached my goal.

I learned the 1000 most frequent words (and probably some more that I didn’t track).

I learned 10 German songs by heart.

And finally, I went to Berlin and had a great weekend with my friend Daniel – in German. He even taught me how to fly a kite:

If you want to do a similar exercise yourself, here’s a table summarizing my approach:

That’s it! Either this gave you some inspiration to start your own 30-day language adventure or you think I’m insane. In any case, let me know what you think of my experiment!

Money Talks: A Cynical Approach to Language Learning

| Comments

Problem: You want to learn a new language but don’t know which one.

You could pick a language that many people speak, or one that sounds nice. Or, you could be a bit more cynical and pick one that gives you access to cash.

Let’s try this rough and simple approach: sum up the GDP per language instead of country, creating a sort of “GLP – Gross Linguistic Product”.

Using this approach, here are the top 12 GDP Languages:

English is the unsurprising winner, but there are a few other things to note. If we compare our graph to a more carefully done study from 2003, we find the following differences

  • Hindi has climbed from 11th (2.3% of total GDP) to 4th (5.83%), but this jump is due to a methodological difference (read: laziness on my part). For simplicity, I’ve lumped ALL of India’s GDP into Hindi, when in reality it would make more sense to distribute the GDP across more of India’s spoken languages. If we correct by the factor used in the study mentioned above (40%), then we arrive at 2.4% – meaning Hindi has had little change since 2002.
  • Chinese has increased its % of total GDP from 12.5% to 15.9%, but it’s still in 2nd place, and the study above predicted Chinese to be at 22.8% by 2010, which seems to have been overly optimistic on behalf of Chinese.
  • Spanish has surpassed Japanese by a mile

Let’s be a bit more direct.

What if you’re a golddigger, and you’d like to speak the language of the rich? Which language should you then choose?

Here are the top 12 GDP-per-speaker languages:

Let’s ignore the fact that Norwegian is the winner for a moment, and instead observe that the top 6 languages are all Germanic languages! In other words, learning one of those might be a good idea, if you’re mainly in it for the money.

What do you think is important when choosing which language to learn?

Using Music to Learn a Foreign Language

| Comments

The other day I picked up one of my favourite albums that I hadn’t listened to in years, Sim by Vanessa da Mata. I don’t really speak Portuguese, but I was surprised to find that I knew almost all of the lyrics by heart. After considering it a for a while, I concluded that this wasn’t very surprising at all. Just think about how many people around the world have learned their first English words from Beatles songs. (…or from Mariah Carey songs)

Surely music is a great way to immerse yourself in a new language.

This realisation lead me to an interesting question: What are the right songs to listen to when you’re learning a foreign language?

Putting musical taste aside, I compiled the lyrics of my Spanish playlist with the simple purpose of sorting the songs from “easy” to “hard”. I want to learn one new song every week – but I want to learn them in a sensible order, based on my level of Spanish.

The first ranking I came up with was fairly simple: calculate the number of (unique) words per song and sort from low to high. The result looks like this:

Of course, this isn’t perfect, but it tells us something about the effort involved in learning a song. “Vuelvo al Sur” will most likely take less time to learn than “Na en la nevera”. So I placed the former at the top of my playlist.

But what about the actual content of the songs? Surely some songs have a more challenging vocabulary than others?

This is hard to accurately quantify, but we can arrive at a proxy for the “language level” of a song by considering the “word rank” of the lyrics. The way we do this is by using a frequency list: ordering words in a language from most used to least used. There is no single “golden frequency list” for a language, since this is context dependent (the words most used in South Park are different from those in the New York Times).

The frequency list I used is based on movie subtitles available here, which I believe is fairly close to casual, spoken language.

Now, the idea is to look up each word in a song in the frequency list, and get its rank (its “word rank”). As an example, in my Spanish frequency list, “que” is the most common word, which means it has word rank = 1. The word “nuevo” has word rank = 200, meaning there are 199 words that are more frequent than it, and so on.

We compute the word rank for every (unique) word in a song, and then compute the median word rank.

So what does this tell us? Let’s say the median word rank is 500 for a given song. This means that 50% of the words in that song will be among the 500 most frequent words. In other words, a large part of the words are “easy”, frequent words, so we conclude that the song’s lyrics are probably not very hard to understand.

Here are the same songs as above, now ranked by median word rank:

Again, “Vuelvo al Sur” shows up first, so I guess I should really learn that song!

Honestly though, I would have expected the lists to be more similar. It turns out, at least for my playlist, there’s not a very strong connection between number of words and the median word rank.

Here are the two metrics on one graph:

(Notice how the Alejandro Sanz songs are all clustered together!)

I’ve divided the songs into four quadrants:

  1. Hard songs (upper right corner; Q1)
  2. Niche songs with few words (upper left corner; Q2)
  3. Easy songs (lower left corner; Q3)
  4. General songs with many words (lower right corner; Q4)

I might be stretching it with my interpretations of Q2 and Q4, but in any case there’s a trade-off between them; Q4 means more effort required to learn more useful words, while Q2 means less effort required to learn less useful words.

My personal choice would be to learn Q3 (easy) songs first, Q1 (hard) songs last, and then randomly pick Q2 and Q4 in between.

What do you think about this approach to learning languages? And have you ever used music to learn a language?

6 Things I Use Evernote For

| Comments

I use Evernote for “Personal Information Management” (PIM) aka “keeping track of things in your life”. Simply put, Evernote is essentially an extension of my brain.

I have two basic requirements for a PIM system:

  1. It must be fast and easy to store information
  2. It must be fast and easy to retrieve information

Based on these requirements, Evernote is the best PIM tool I’ve come across. Still, many people find it difficult to get off the ground with Evernote, so I thought I’d share some ideas that I’ve built up over the last few months.

Here are 6 things I use Evernote for:

1. Physical-notebook substitute

The obvious one. I used to carry Moleskines around, mainly to jot down ideas. In some situations I still do, but now I never have to worry about remembering to bring one with me, and where to store it when I’m out of pages.

2. Storing things I find online

Articles, tables, graphs, anything! Just use the Web Clipper plugin and you’ll save it in a flash. Recently, I was reading up on classified advertising, and every article I found on the topic, I tagged with “classifieds” (in addition to other relevant tags for the article). Now I have a small research bank with highlighted articles on the topic, which I can re-visit when I need to. Again: simple to store, simple to retrieve.

Also, very often I find articles I don’t have time to read. Evernote Web Clipper solves this for me in a nice way: Clip, then tag the article with “to read” (plus other relevant tags), and change the tag to “read” when you’ve read it.

3. Digitizing paper documents

Typical scenario: Find letter in post box. Rip it open. Skim it. Take a photo. Store in Evernote. Tag it. Maybe add a reminder. Then the best part: throw the letter away. Ah. Doesn’t that feel nice?!

4. Todos

Evernote is not great for todos in itself, but here’s the method I use on my work laptop. Have one note per week, and simply copy that note every new week, moving things you didn’t get done last week over to your new note. Here’s an example:

A nice bonus feature if you have a tablet: try the presentation view of your last week’s completed tasks. Your accomplishments have never looked better!

5. Language learning

I also use Evernote to build vocabulary in foreign languages. The method is simple: again, create one note per week (e.g. “French Vocab Week 16”), and add new words as you go! Example:

It’s a nice way to situate your learning – that is, you’ll be able to register when and where you learned a given word. You can also easily keep track of your learning goals: how many new words did I learn last week, etc.. Also, you can set reminders to notes so you revise the words you jot down.

6. Journal keeping

I’m not able to do one entry per day, but if I have 10 minutes to spare at the end of the day, I sometimes spend those minutes on a journal entry. There are a few Evernote journal apps around, but I’m fine with just using the title to timestamp my note, then tagging it with “journal”.

Bonus: Procrastination

Think about it, if you accept you will have some amount of procrastination in your life, why not do something useful? I have tags for retrieving content from my Evernote when I’m bored. I already mentioned “to read”, but “ideas” is another good one. Reading your own ideas and developing them further is obviously good fun. If you’re learning a new language, “vocab week” is another good tag to have when you’d normally be wasting time on your smartphone.

Summary

The common theme of the things listed above is that they all give me a better conscience. I worry less about ideas going lost, not finding useful web content again, letters needing archiving, tasks being forgotten, not spending enough time building vocabulary, my journal being empty, and precious time being wasted on mindless smartphone fiddling.

I’ve mentioned some things I use Evernote for, and I hope you’ve learned something useful from reading this. I would love to hear your favourite Evernote tips, so: What do you use Evernote for?

Fun fact: this blog post was written on my iPhone while flying from Oslo to Frankfurt using … Evernote, of course.

List of Companies Who Provide Data Quality as a Service

| Comments

Data Quality is important. However, most companies consider activities such as validating postal addresses and identifying duplicates to be outside of their core business. Luckily, investing in software and building up an internal DQ team isn’t the only option to improve Data Quality. In many cases, it makes sense to instead have Data Quality as a Service (DQaaS).

Below you’ll find a short (but growing) list of companies who offer Data Quality as a service. If you know of any companies who should be added to the list, feel free to tweet me your suggestion.

Do you know of any companies providing Data Quality as a service?

Disclaimer: I do not guarantee for the ability or service level of any of the companies above. You should of course make your own investigations to see if a given provider will fit with your requirements – or seek advise to make those investigations.

Word2vec: Deep Learning Language

| Comments

Introducing word2vec

This month, Google open sourced a tool called word2vec, described as a “tool for computing continuous distributed representations of words”, or in the words of Derrick Harris “prepackaged deep-learning software designed to understand the relationships between words with no human guidance”. You can read more about word2vec in Forbes and on Google’s blog. Now, you might already find those descriptions motivation enough to take word2vec for a spin, but if not, here’s a concrete example on what word2vec is capable of. Using word2vec, your computer can learn a representation of words where the following two analogical relationships are true:

king – man + woman = queen

Paris – France + Italy = Rome

…Without being explicitly told to look for them! Specifically, word2vec gives you numerical representations of words, in the form of vectors, so that every word lives in a high-dimensional space, and can be compared by distance in that space (along the different dimensions). This itself isn’t new; Latent Dirichlet Allocation also does this, but the way this space is constructed, is.

Now, why would we want numerical representations of words in the first place?

Text is hard, numbers are easy

Firstly, if you’ve ever worked with text as a data scientist, you’ll know that it’s hard. Depending on your task, you’ll run into all sorts of problems getting useful information out of text only using a computer (and not your own brain – that’s cheating!).

Secondly, numbers are easy. You can find relationships in them, measure correlations, derive new numbers from them, visualise them, and so on. So obviously, converting words into numbers is a good idea!

In a way, it sounds ironic that text is hard, since as human beings we communicate through text everyday (in fact, it’s a good example of Moravec’s paradox)! However, when you let your computer at a piece of text, it doesn’t really have any conceptual framework to put the text into, and thus never really gets off the ground in understanding it. Word2vec is a step in the right direction to give your computer that missing conceptual framework.

Using word2vec

If we’re doing predictive modeling, say, in a Kaggle competition, we’re generally trying to take some input data (say, your age and sex) and predict some output variable (e.g. the probability that you’ll crash your car in the next year). Now, if your input data were your diary (in some horrible world where insurance companies secretly read your diary), how could a computer use that to determine your crash probability – assuming you can’t just read them all manually?

Step one would be to deduce useful features from your diary. The simplest features would be simple statistical ones such as how long entries your write on average, how often you write entries, how many different words you use, etc. The more interesting ones would be based on the content of your entries. This is where word2vec comes in. With word2vec, every word in your journal is a data point.

What use cases for word2vec do you see?

Startup Engineering

| Comments

I just enrolled in Coursera’s course Startup Engineering, which looks really good. The course is held by Balaji S. Srinivasan and Vijay S. Pande at Stanford University.

My main reasons for signing up to the course were to learn more about JavaScript-based technologies like node.js, and more generally to fill gaps in my own knowledge related to best practices in web development.

I’ve only seen the videos for the first lecture, but so far it looks good. I was positively surprised by Srinivasans notes on the history of startups, connecting the decline of the Soviet Union to the Internet Startups of the 1990s. In other words, it seems like the course will have a fairly broad horizon – it’s not just about technology. Srinivasan himself stated it will be 50% technology and 50% “philosophy”.

I encourage you to check out the course, which will be running until August 2013. You can find my public GoogleDrive notes for the course here.