From no versioning to git

21 Nov 2012

When I started working as a professional web developer, I wasn't using any versioning system. When I needed to add a new feature to an existing website, I just coded it and then uploaded it by FTP.

Sometimes, it turns out that the feature was a failure, and that I needed to get back to a previous version. Most of the time, I remembered the old feature and could just type the old code back. Other times, I couldn't and getting the website back to its previous state was a nightmare.

So whenever I felt a feature could go wrong, I started creating manual backup of files before editing them, prefixing them with -version2 and such. It worked ok at first, but after some time, it was impossible, even for me to tell the different versions appart.

I also started to realize a trend on the web, reading from other developers, that there was something call a version-control system. I thought I needed to use one of them, to keep track of different version of files.

I wasn't exactly sure which one was right for me, so I asked a few friends. Most of them told me they were using Subversion at work, but that it had a few shortcomings, but that it was still better than nothing. Another friend told me that he was using Mercurial, both for personal projects and at work, at that his whole team was actually versioning everything with it and were very happy about it.

First steps with mercurial

So I thought I should start using Mercurial. If I have to learn a new tool, I'd rather take one that works well and where I have a friend able to help me on my first steps.

Back in these days, I was still using Windows so I downloaded and installed Mercurial and its GUI version, TortoiseHG. I was quite happy with it the instant I installed it. I could just "commit" (which was a new word for me) my changes and know for sure that they were safe and that I could get back to previous versions of my file easily.

I was committing all my changes at the end of each day, writing a long commit message listing everything I had done in that given day. Then, the next morning I could just re-read the previous commit message and remember what I was doing. The commit message worked a bit like a todo list for what needed to be done.

I was mostly using Mercurial as a backup system. I even opened an account on BitBucket where I uploaded my changes every day, in the case my laptop crashed, I still had backups of everything online.

A little note about FTP

Using Mercurial had another benefical side-effect on my productivity. It made me discover how backward FTP upload was. Before using Mercurial, when I needed to add changes to a website, I connected to the server with FTP and transfered every changed files from my local machine to the server.

Of course, I had to remember which files I changed, and upload them one by one. This was tedious. I had to move from directory to directory before copying the files. I can't tell you how many times I had a bug in production that I couldn't replicate on my local machine, just because I either forgot to upload a file or uploaded it in the wrong place.

Sometimes just to be sure to upload all the files on their correct places, I simply uploaded the full website directory and let the FTP software choose if they needed to be updated or not. But this took ages as the software had to send a request to compare timestamp modification date for each file.

And if I only had two files changed, I had to wait for the FTP to scan each file between the first and the second one before it gets uploaded. This too resulted in bug reports from customer that I couldn't replicate, even on production because by the time I get to check the website, the missing file that was causing the error had been uploaded and everything worked perfectly for me. In the end, it created a list of "ghost bugs" that I was never sure if they were real bugs or created by the FTP upload lag.

Mercurial get rid of all this crap because it was clever enough to know which files were changed from the last deployment, and even which part of the files where changed, sending only the difference. This resulted in much faster upload time and no more errors in missing files. Also, I got bonus security points for transferring through ssl and not clear text FTP.

Sorry for this little FTP disgression. Anyway, as you remember I was using Mercurial as a backup system more than a version-control system. Commiting once a day became cumbersome as I wanted more fine-tuning on what I commited. I ended up commiting twice a day, or even every time I took a break, but then I realized it made much more sense to commit based on feature and not on time.

No more Mercurial

And that's when Mercurial started to get in the way. I often worked on a feature and while doing that, spotted tiny bugs or typos that I fixed while working on the big feature. I wanted to commit only related changes together, and didn't manage to do that well with Mercurial.

I guess this is because I did not really understand how Mercurial worked back in these days, but to be honest I do not understand it better now. Whenever a file is modified in Mercurial, it is ready to be added to the next commit. But when you create a new file or delete one, you have to tell Mercurial that you want this add/delete to be registered in the next commit.

As I said, I sometime just fixed small bugs or typos in files that had not much to do with the feature I was working on at that moment. I would have like to be able to commit only these files, and not the others. I never really managed to make it correctly with Mercurial so in the end I was still commiting more files that I wanted to.

That's when I considered trying git instead of Mercurial. I made some googling on "git vs Mercurial" to see how they differ and the general consensus seems to be that git is more low level than Mercurial. Git has a plethora of commands, most of them you'll never use, while Mercurial is focused on a workflow for the end user that works well. Also, git allow the user to rewrite its history.

After reading all this I was convinced Mercurial was the right tool for me. I was still struggling with my current version system, I didn't want to try one that was even more hard to understand. Plus, rewriting history ? I don't want that to happen in real life, I sure don't want that to happen with my files either.

So I digged deeper in Mercurial, trying to understand it better, to grok it but no, really, I still could'nt make it behave the way I'd like to work with my files so I finally decided to give git a try.

First steps with git

Because there was this thing named github, and all the cool kids where on github, and I too wanted to participate in those big open source projects and I felt like I was held back by the tool I was using.

On my first hours of using git I managed to do what I had struggled to do with Mercurial for so long. I could easily choose which files to commit or not and split my work in several commits. Of course, I could never have done it without google. Git man pages are gibberish and the command line commands and options list is so vast I felt lost. Even when I found what I needed I wondered why the git developers choosed to give them such abstract names without any apparent cohesion in the complete git API.

I had to create aliases to do my day to day tasks with git so as not to remember and type all these crazy commands, but in the end it turns out ok. I keep running into problems, but none that the help of google, StackOverflow and the incredibly rich git API can't solve.

I'm now versioning any new project with git and even converted some old projects from Mercurial to git. I know I'm still in the early stages of learning git (I've only started using branches extensively since a few weeks) but I do really enjoy it a lot.

What's next ?

I've done all my git training by myself, while in New Zealand for a year, away from work and just for fun. I know that when I'll get back to Paris and start looking for a new job I'd like to find a place where the day-to-day workflow involves git because I really want to see the way branching and merging helps people work together building bigger things.

Spectrum, a script to see your terminal 256 colors

06 Sep 2012

Here is a little script I hacked together to display in your terminal all the colors available in the 256 color spectrum, as well as a visual representation of them.

Screenshot of spectrum output

The oh-my-zsh project actually ships with a method named spectrum that does pretty much the same thing, but I found that it was hard to actually get a real feel of what the color where with the original output. So I coded this one.

It does not display the colors in the range {000..015} because those are dependent on your terminal configuration. Also, the output is split in blocks of 6 lines, with 2 blocks displayed side by side. You can modify those values in the script if you want to make it easier to read on you screen. I use 2 and 6 because it is what is the more readable on my tiny netbook screen.

Ok, enough talk, here is the script

#!/usr/bin/zsh
# Display the terminal 255 colors by blocks

# Long space is long
local space=" "
for i in {0..22}; do space=$space" "; done

# Number of color lines per block
local lines=6
# Number of blocks per terminal line
local blocks=2

# Tmp var to hold the current line in a block
local m=0
# Tmp var to hold how many blocks are filled
local b=0
typeset -A grid

# We want to display the blocks side by side, so it means we'll have to create
# each line one my one then display all of them, and after that jump to the
# next set of blocks
for color in {016..255}; do
    # Current line in a block
    m=$((($color-16)%$lines))

    # Appending the displayed color to the line
    grid[$m]=$grid[$m]"^[[01m^[[38;5;${color}m#${color} ^[[48;5;${color}m${space}^[[00m  "

    # Counting how many blocks are filled
    [[ $m = 5 ]] && b=$(($b+1))

    # Enough blocks for this line, display them
    if [[ $b = $blocks ]]; then
        # Reset block counter
        b=0;
        # Display each line
        for j in {0..5}; do
            echo $grid[$j]
            grid[$j]=""
        done
        echo ""
    fi
done

# Display each remaining blocks
for j in {0..5}; do
    echo $grid[$j]
    grid[$j]=""
done

Oh and one last word. You might want to pipe the output to less to make it easier to read if you're on a small screen like me.

ZSH autocompletion for git unstage

30 Aug 2012

I recently switched to git as my main version control system (I used to use Mercurial before that). I quickly grasped the concept of the staging area and used the git add command extensively to add before committing.

And zsh ships with nice git autocompletion features, and suggests files to add with git add when you press Tab.

Git unstage

What is missing from the basic git commands is a way to unstage a file from the staging area. Well, I can do it with git reset HEAD, but this command is a bit tedious to type.

So I created an alias, named git unstage that does just that. I just added the following lines to my ~/.gitconfig, under the [alias] header :

unstage = reset HEAD

I can now easily add files to the staging area with git add and remove them with git unstage

Autocompletion

At this point, git unstage autocompletion does not work. It simply suggest all files in the current directory, while I'd like it to suggest only files in the staging area.

When you create a git alias, you cannot simply add a zsh autocomplete method for that alias (meaning, you cannot create a _git-unstage method), you have to hook your custom autocomplete logic to the underlying git method your alias refers to. In this case, this is the git reset method.

So, I created my own _git-reset autocompletion function. Actually, I borrowed the one already defined in /usr/share/zsh/functions/Completion/Unix/_git and tweaked it a little bit.

I created a file named _git-reset and put it in my $FPATH so zsh will load it when asked for autocompletion and it will overwrite the _git-reset method already defined in _git.

Here is the full content of the file :

#compdef git-reset

_git-reset () {
    local curcontext=$curcontext state line
    typeset -A opt_args

    _arguments -C -S -A '-*' \
            '(-q --quiet)'{-q,--quiet}'[be quiet, only report errors]' \
            '::commit:__git_revisions' \
        - reset-head \
            '(        --soft --hard --merge --keep)--mixed[reset the index but not the working tree (default)]' \
            '(--mixed        --hard --merge --keep)--soft[do not touch the index file nor the working tree]' \
            '(--mixed --soft        --merge --keep)--hard[match the working tree and index to the given tree]' \
            '(--mixed --soft --hard         --keep)--merge[reset out of a conflicted merge]' \
            '(--mixed --soft --hard --merge       )--keep[like --hard, but keep local working tree changes]' \
        - reset-paths \
            '(-p --patch)'{-p,--patch}'[select diff hunks to remove from the index]' \
            '*::file:->files' && ret=0

    case $state in
        (files)
            local commit
            if [[ -n $line[1] ]] && __git_is_committish $line[1]; then
                commit=$line[1]
            else
                commit=HEAD
            fi
            # Suggest files in index if `git reset HEAD`
            if [[ $line[1] = HEAD ]]; then
                __git_changed_files
            else
                __git_tree_files . $commit
            fi
            ret=0
            ;;
    esac
}

_git-reset "$@"

As you may have noticed this script is an almost exact copy/paste from the original _git-reset script. The only modification I've done is in those lines :

# Suggest files in index if `git reset HEAD`
if [[ $line[1] = HEAD ]]; then
    __git_changed_files
else
    __git_tree_files . $commit
fi

What it does is checking the first argument of git reset, and if it's HEAD, it suggests files in the staging area (__git_changed_files) instead of files in the current repo (__git_tree_files).

It took me quite a bit of time to figure out which method to use to get files in the staging area as I had been looking for a __git_staged_files for quite a while and finally discovered that the __git_changed_files was actually what I was looking for.

Conclusion

The git unstage alias is quite common, you can find it in a lot of books and websites teaching git. But it becomes much more usable once zsh does the autocomplete for you and you can easily select files to unstage that way.

Use the system trash with ZSH terminal

20 Aug 2012

When you start using the terminal as your main file explorer instead of a GUI one, you soon discover two important things.

First, it is much faster to browse accross your filesystem, copying and moving files in the terminal that it is with your mouse. This is hard to believe at first (how can writing text on a dull black screen be faster than drag'n'dropping ?) but it is nonetheless true (after a bit of practice, sure).

The second thing is that it is also much much easier to *permanently delete very important files *as deleting a file through the terminal has no trash bin nor any other safeguard mechanism.

Scripting a rm replacement

At first, I scripted my own rm rm replacement that was manually moving files to ~/.local/share/Trash/files (the common Trash directory) instead of deleting them. But it was a bit naive and couldn't really work on removable drives nor provide a "restore" mechanism.

Fortunatly, the trash-cli package on Ubuntu provides a set of methods to deal with the trash from the command line. They have very explicit names such as trash, list-trash, restore-trash or empty-trash.

ZSH aliases

I had to resort to quite a bit of ZSH tweaking to make it a perfect rm replacement. First, I added a simple alias for the rm command.

alias rm='trash'

Then I also wanted to change the default rmdir command. I could have used the same type of alias (alias rmdir='trash') but I would have lost the builtin ZSH autocompletion of directories zsh provides with rmdir.

When you define aliases with ZSH, you can choose if you want it to autocomplete based on the right hand side of your alias (NO_COMPLETE_ALIASES) or the left hand side (COMPLETE_ALIASES). Yes, the name of the options seem wrong to me too, but this is actually how it works.

I prefer setting NO_COMPLETE_ALIASES so I can use the correct autocompletion on my commands with my aliases, but for the rmdir case this was proving to be an issue.

rmdir autocompletion

So, I started writing my own rmdir implementation in a custom script. This was merely a wrapper to rmdir but putting it in its own script allowed me to change its name and thus changing its autocomplete method.

I named it better-rmdir, and put it in my $PATH. Here is the code

#!/usr/bin/zsh
trash $@

As you can see, this is just a wrapper, taking the initial arguments and passing them to trash.

But I also created a file named _better-rmdir and put it in my $FPATH (this is where ZSH goes looking to autocomplete methods). I just copied the code of the original _directories method (that you can probably find in /usr/share/zsh/functions/Completion/Unix/), and adapted it to fit my newly created better-rmdir

#compdef better-rmdir
local expl
_wanted directories expl directory _files -/ "$@" -

And finally, I added an alias (alias rmdir='better-rmdir') and everytime I ask for an autocomplete on rmdir it actually looks for the autocomplete of better-rmdir, which is the code contained in _better-rmdir and which in turn return only directories.

Now I have complete rm and rmdir commands in my terminal that move files to the trash.

A small note about video encoding and formats

09 Aug 2012

Following my previous note on audio files, I made some research on video files. I have a huge amount of video on my hard drive, so much that I actually needed to do a bit of cleaning.

That's when I discovered that I had a plethora of different file formats (avi, mp4, mkv) and in various resolution, quality and filesize. Also, filesize was not always dependent on quality. I had some really heavy files that were not better looking that smaller one.

So I started some research on the Wikipedia and Google to understand more about all those fileformats, and here are my results.

Containers

First of all, the avi, mp4 or mkv file extension denote a container format. This container is just a box that contain video and audio stream. Not all boxes are created equal, though.

avi is a Microsoft container but gained (in)famous popularity in the first days of p2p filesharing. This is a very simple container, able to hold one video and one audio stream. Then, was created the mp4 container. This one is an ISO box, so already much better than the Microsoft one. This actually is a good container, allowing the use of several audio and videos streams as well as subtitles.

Then, came the mkv container. It is as good as mp4, and even allow more customization, but the important point is that it is patent free. The only downside to mkv is that it is not as widely supported as the other two format, but the specs being public and open, this is slowly changing.

I also had some other extensions, like wmv, mov or divx. divx is only an avi in disguise, because of legal reasons. wmv is another Microsoft container, this one being an improved avi with DRM. mov is an Apple container, aimed to media creation. A mov is actually quite a powerful container, that could be used for much more than simply media playback but also media creation.

If I had to choose, I'll pick mkv.

Video codecs

Now, what do we put in those boxes ? First, a video stream. We could put an uncompressed video stream in it, but this will result in huge filesize, so we actually never do that.

Instead, we use codec (short for coder/decoder) to compress this stream in something with a more manageable size. The most famous is DivX. DivX started as an encoder hacked from Microsoft first avi files and was heavily used to encode movies before sharing them online. Because of legal dispute with Microsoft, the guys that created DivX had to recode it from scratch. In the meantime, an opensource fork of DivX, named Xvid was created and after a few version became even better than the original Divx.

MPEG, MPEG2 and MPEG4 are succesive iteration of another codec that compress a video stream by looking at its pixels and checking if they changed from one frame to another, to avoid redrawing them. With each succesive versions it then started to track movement, color and lightning of these pixels. In the end, it gave birth to the h264 codec which is used on Blueray (while MPEG2 was used on DVD). h264 is the de facto standard of HD video of today. It requires more processing power than its MPEG counterparts but deliver a much better quality for an equal filesize. Recent hardware is today optimized to handle h264 natively.

As for mkv and mp4, there also is the same difference between H264 and Theora. Theora got equivalent quality than H264, but is patentfree. On the other end, H264 is very widely supported while Theora is not (as can be seen in HTML5).

Microsoft also created a closed source wmv codec to go with its wmv container. It is supposed to be based on MPEG4, but there is little known to it, so it of absolutly no use to me.

So, as you can see, there is almost no relation between a codec and its container. You can have a divx encoded file in an mkv, or a Theora in an avi. But, in the real world, some codec are most often found with some container, like divx in avi or h264 in mp4.

Here, I'll pick h264 because of the hardware support.

Audio codec

As for the video stream, the audio stream is also compressed using an audio codec. The most common audio codec is the mp3. This is a lossy codec, meaning it discards information to get the filesize lower. It is based on a method that will discard sounds that the human hear will not be able to hear anyway.

mp3 has its drawbacks, like being bloated from succesive mpeg versions and still being patented.

A new codec, the AAC succeed to mp3. It will also discard sounds the human hear cannot get, but will also encode redundancy in a better way. Even if mp3 was the de facto standard for a long time, AAC is gaining a huge popularity because it is backed up by all the big companies.

Then, the story gĹs on and on. There is an open source, patent free equivalent to AAC, being ogg. Once again, its as good as its opponent (even better on low bitrates), but there isn't as many devices able to read ogg as they're are able to read AAC. But once, again, things are slowly changing.

Here, I'll choose ogg over the alternatives.

Conclusion

I have a bunch of files, in different containers, encoded with different encoders and I'm going to try to clean a bit all this. First, I'll get rid of all the "bad" containers (mov and wmv) and use mkv instead.

For files that are too big, I'll try to convert them to a better format. h264 video with an ogg audio stream in an mkv container.

Well, there also is the matter of the video resolution, data rate, fps and audio frequency but this might be the subject of another post.

Older Newer

Pixelastic

Search

From no versioning to git

First steps with mercurial

A little note about FTP

No more Mercurial

First steps with git

What's next ?

Spectrum, a script to see your terminal 256 colors

ZSH autocompletion for git unstage

Git unstage

Autocompletion

Conclusion

Use the system trash with ZSH terminal

Scripting a rm replacement

ZSH aliases

rmdir autocompletion

A small note about video encoding and formats

Containers

Video codecs

Audio codec

Conclusion