The Rational Coder

GoFredX Convention

I'm currently attending tne GoFredX Convention in Fredericton. The talks so far have been great. I think the talks after lunch are going to be even better.

Maintaining Your System from the Command line

Many Linux distributions use some form of packaging system to to organize the applications installed on a system. Using a formal packaging system lets you install, remove and in general, maintain the software on your system in a controlled and coherent way. The three main packaging systems are the Debian deb package, the Red Hat rpm package, and the Slackware pkg package. The vast majority of the distributions today will use one of these three packaging systems. They all have graphical utilities to interact with the packaging system. But what if you want to deal with the system on the command line? Say you are running a server, or are accessing a distant machine through ssh and don't want to deal with the overhead of X11? This month we'll take a look at how to do this for Debian based systems.

The first thing you will probably want to do is install some software on your system. The preferred way to do this is through the utility apt-get. apt-get is aware of the chain of dependencies between packages. Let's say you want to do some star gazing and want to install stellarium on your system. You would run
apt-get install stellarium
This would download the relevant package file, and all of its dependencies, from a repository. What if you don't know exactly what the package is named? You can query the package management system with the utility dpkg-query. If you know that the package name has "kde" in it, you can list all of the matching packages with the command
dpkg-query -l "*kde*"
Remember to quote any search strings that have a "*" in it so that you don't inadvertently have the shell try and expand them.

This works great for software available in the given repository. But what if you want something not available there? If you have a ".deb" file available for download, you can download it and install it manually. After downloading the file, you would install it by running
dpkg -i file_to_install.deb
The utility dpkg works with the deb packaging system at a lower level than apt-get. With it, you can install, remove and maintain individual packages. If you have a whole group of packages you would like to install, you might want to add the relevant repository to your list so that apt-get will know about it. The list of repositories is stored in the configuration file /etc/apt/sources.list. Each line has the form
deb http://us.archive.ubuntu.com/ubuntu/ karmic main restricted
The first field tells apt-get what is available at this repository: deb is for binary packages, deb-src is for source packages. The second field is the URL to the repository, in this example we're looking at the Ubuntu repository. The third field is the repository name, in this case the repository for the karmic version of Ubuntu. The last fields are the sections we want to look at to install packages from. This example will look at the sections main and restricted when trying to install applications or resolve dependencies.

Now that you have some applications installed, you will probably want to maintain them and keep them updated. Every piece of software will have bugs, or security issues, that come to light over time. Software developers are always releasing new versions to fix these issues, and updating the relevant packages in the repositories. To update the list of software and their versions on your system, you would run
apt-get update
Once you've updated the list, you can tell apt-get to install these updates by running
apt-get upgrade
If you want a list of what is about to be upgraded, add the command line option "-u"
apt-get upgrade -u
Sometimes, when a new version comes out for a package (like when your distribution releases a new version), the dependencies for said package might change, too. In cases like this, a straight upgrade might be confused. In these cases you can use the command
apt-get dist-upgrade
This command tries to intelligently deal with these changes in dependencies, adding and removing packages as necessary.

What do you do if you've installed a package to try it out and don't want it anymore? You can remove a package with the command
apt-get remove stellarium
This removes all of the files that were installed as part of the package stellarium, but leaves any configuration files intact and also doesn't deal with any extra packages installed because stellarium depended on them. If you wish to completely remove a package, including all configuration files, you would run
apt-get purge stellarium

All of this software installation and removal could result in cruft accumulating on your system. You may end up with unnecessary packages wasting space on your system. To start to recover some space you can run the command
apt-get autoclean
This command will remove the package ".deb" files from the local cache for packages that can no longer be downloaded. These would be mostly useless packages. If you want to completely clean out the local cache and recover more space, you can run
apt-get clean
While "remove" and "purge" will remove a package, what can you do about any dependencies installed for this package. If you run the command
apt-get autoremove
you can uninstall all packages that were installed as dependencies for other packages and aren't needed anymore.

Another way of finding packages that aren't needed anymore is through the utility deborphan. The first thing you'll need to do is install it, using
apt-get install deborphan
since most distributions don't install it by default. Once it is installed, running it with no command line options will give you a list of all packages in the libs and oldlibs sections that have no dependents. Since no other package depends on these packages, you can safely use apt-get to remove or purge them. If you want to look in all sections, you can use the option "-a". If you're trying to save space, you can ask deborphan to print out the installed sizes for these orphan packages by using the command line option "-z". You can then sort them out by running
deborphan -z -a | sort -n
This will give you a list of packages you can safely uninstall, sorted by installed size from smallest to largest. You can then free up space on your system by getting rid of the biggest space wasters.

Hopefully this gives you a good starting point to dealing with the Debian package management system. Each of the tools discussed above have lots of other options that you should research in the relevant man pages. Also, if you use a Red Hat based system, there are equivalent commands to these to help you manage the rpm files used there.

Using sed to clean up files

You can use sed to delete particular lines, based on some unique criteria. This would look like

sed -e "/search criteria/d" -i2 file1.txt

This happens "in place", replacing the file's original contents. If you want to process a number of files in the same way, you can use

ls *.txt | xargs -i{} sed -e "/search criteria/d" -i2 {}

This takes the results from ls and hands it in to xargs. xargs then runs sed on each filename, one at a time.

Stupid afio Tricks

We've already looked at tar, and all of the wonderful ways that it can be used. But it is not the only tool at our disposal. Another tool that gets used quite a bit for doing backups is afio. Depending on your distribution, it may or may not be already installed. In Ubuntu, for example, you would have to run

sudo apt-get install afio

to get it installed on your system. Once you do, you have a fairly powerful tool at your disposal for archiving files and making backups.

By default, afio reads and writes the files being archived on standard input and standard output. This means that you can create your list of files to archive with another program, like find, and pipe it to afio to do the actual archive. Once you have your list of files, you can apply 5 basic commands to those files

-o create an archive
-i install (or unpack) an archive
-t test (or list) the files stored in an archive
-r verify the files stored in an archive against the file system
-p copy the files to a given directory location

If you want to create a simple archive of all of your C source code files, you would execute

find . -name *.c -print | afio -o -Z source_code

When you want to extract these files again, you would execute

afio -i -Z source_code

When you run afio as a regular user, all file paths are stored with no leading "/". This means that when you unpack an archive, it will unpack in the current directory. The idea is to hopefully avoid accidentally overwriting system files. To keep the leading "/", you need to use the command line option "-x". If you run afio as the superuser, then this behavior is reversed. Any leading "/" is maintained, and you need to use the command line option "-X" to get the usual behavior of stripping leading "/".

If space is at a premium, afio can also compress your archive, just like tar can. This is done by using the command line option "-Z". There is one very big difference, however. When you compress a tar archive, the entire archive file gets compressed. This means that if you have a corruption in one part of the file, you could potentially lose all of the files in the archive. When you compress an afio archive, the archived files are actually compressed individually. This means that if one file becomes corrupted, by whatever means, you won't actually lose any of the other files in the archive. When you do compress an archive, afio uses gzip by default. You can tell gzip what compression factor to use with the command line option "-G num", where num is the amount of compression gzip is to apply to the archived files. This is a number between 0 (for no compression) and 9 (for maximum compression), with a default of 6. You may need to balance how much CPU and how much IO time is being used during the compression phase. If so, you can put limits on when compression is to be used. The command line option "-T threshold" tells afio not to try and compress a file unless it is at least threshold bytes in size. The default setting is "-T 0k", so afio tries to compress all files, no matter how small. At the other end of the spectrum, you may want to limit how large a file can be before afio tries to compress it. You can do this with the command line option "-2 max", where max is the maximum file size. The default in this case is "-2 200m", so afio won't try and compress files larger than 200MB.

What if you don't want to use gzip as your compression method? You can change this by using the command line option "-P progname", where progname is the name of the executable to use to do the compression. If you need to hand options in to this alternate program, you can do this with the option "-Q opt". You need to use separate "-Q" options for each option you need to hand in to the alternate program. Because afio simply executes this alternate program, you can run anything at this stage. This could include an encryption program, allowing you to encrypt your archive. To encrypt your archive using PGP, you could execute

export PGPPASSFD=3
find . -name *.c -print | afio -ovz -Z -U -P pgp -Q -fc -Q +verbose=0 -3 3 archive 3
This would run PGP on each file in the archive as they are added.

The last small trick with afio is that you also have the ability to interact with archives on external systems. The way you do this is similar to how you do it with tar. The format looks like

[user@]host[%rsh][=afio]:file

The option "user@" is the user name you would use to access the external system. The default communications mechanism is rsh, but you could change that to ssh by using the option "%ssh". You can define the command to use on the external system by using the option "=afio". You can use this if the executable is named something else, or in an odd location. So, if you wanted to archive all of your source code files onto an external server over ssh, you could execute

find . -name *.c -print | afio -o -Z user@server%ssh:archive

Using afio, you can now go forth and ensure that you have proper backups of all of your important information. So now you don't have any excuses anymore.

Controlling Your Processes

I believe that it was the bard who said

All the CPU's a stage,
And all the processes and threads merely players;

or something like that. In any case, it is true. All of the processes that you want to run on your machine are like players, and you are the director. You control when they run and how they run. But, how can you do this? Well, let us look at the possibilities.

The first step is to run the executable. Normally, when you run a program, all of the input and output is connected to the console. So you see the output from the program and can type in input at the keyboard. If you add an '&' to the end of the program, this connection to the console is severed. Your program will now run in the background and you can continue working on the command line. When you run an executable the shell actually creates a child process and runs your executable in that structure. But sometimes, you don't want to do that. Let's say you have decided that no shell out there is good enough and so you have decided to write your own. When you're doing testing, you want to run it as your shell, but you probably don't want to have it as your login shell until all of the bugs have been hammered out. You can run your new shell from the command line with the 'exec' function
exec myshell
This tells the shell to actually replace itself with your new shell program. To your new shell, it will look like it is your login shell. Very cool. You can use this to also load menu programs in restricted systems. That way, if your users kill off the menu program they will get logged out, just like killing off your login shell. This might be useful in some cases.

So, now that your program is running, what can we do with it? If you need to pause your program temporarily (you may need to look up some other information, or run some other program), you can do this by typing ctrl-z (Control and z at the same time). This pauses your program and places it in the background. You can do this over and over again, collecting a list of paused and backgrounded jobs. To find out what jobs are sitting in the background, you can use the shell function 'jobs'. This will print out a list of all background jobs, with output looking like

[1]+ Stopped man bash

If you wanted to also get the process IDs for these jobs, you can use the option '-l'

[1]+ 26711 Stopped man bash

By default, jobs will give you both paused and running background processes. If you only want to see the paused jobs, use the option '-s'. If you only want to see the running background jobs, use the option '-r'. Once you finished your sidebar of work, how do you get back to your paused and backgrounded program? The shell has a function called 'fg' which lets you put a program back into the foreground. If you simply execute 'fg', the last process backgrounded is pulled back into the foreground. If you want to pick a particular job to foreground, you would use the '%' option. So if you wanted to foreground job number 1, you would execute 'fg %1'. What if you wanted your backgrounded jobs to continue working? When you use ctrl-z to put a job in the background, it is also paused. To get it to continue running in the background you can use the shell function 'bg'. This is equivalent to having run your program with a '&' at the end of it. It will stay disconnected from the console but continue running while in the background.

Once a program is backgrounded and continues running, is there any way to communicate with it? Yes there is: the signal system. You can send signals to your program with the command 'kill procid', where procid is the process ID of the program you are sending the signal to. Your program can be written to intercept these signals and do things, depending on what signals have been sent. You can send a signal by either giving the signal number or a symbolic number. Some of the signals available are

1 SIGHUP terminal line hangup
3 SIGQUIT quit program
9 SIGKILL kill program
15 SIGTERM software termination signal
30 SIGUSR1 user defined signal 1
31 SIGUSR2 user defined signal 2

If you simply execute kill, the default signal sent is a SIGTERM. This signal tells the program to shutdown, as if you had quit the program. Sometimes, your program may not want to quit. You sometimes have programs that simply will not go away. In these cases you can use 'kill -9 procid', or 'kill -s SIGKILL procid', to send a kill signal. This will usually kill the offending process with extreme prejudice.

Now that you can control when and where you program runs, what's next? You may want to control the use of resources by your program. The shell has a function called 'ulimit' which can be used to do this. This function changes the limits on certain resources available to the shell, as well as any programs started from the shell. The command 'ulimit -a' will print out all of the resources and their current limits. The resource limits that you can change will depend on your particular system. As an example, which crops up when trying to run larger Java programs, let's say you need to increase the stack size for your program to 10000KB. You would do this with the command 'ulimit -s 10000'. You can also set limits for other resources like the amount of CPU time in seconds (-t), maximum amount of virtual memory in KB (-v), or the maximum size of a core file in 512-byte blocks (-c).

The last resource that you may want to control is what proportion of the system your program uses. By default, all of your programs will be treated equivalently when it comes to deciding how often you programs get scheduled to run on the CPU. You can change this with the command 'nice'. A regular user can use nice to alter the priority of their program down from 0 to 19. So, if you are going to run some process in the background, but you don't want it to interfere with what you are running in the foreground, you can run it by executing
nice -n 10 my_program
and this will run your program with a priority of 10, rather than the default of 0. You can also change the priority of an already running process with the program 'renice'. If you have a background process that seems to be taking a lot of your CPU, you can change it with
renice -n 19 -p 27666
This will lower the priority of process 27666 all the way down to 19. Regular users can only use nice or renice to lower the priority of processes. The root user can increase the priority, all the way up to -20. This is handy when you have processes that really need as much CPU time as possible. If you look at the output from top, you can see that something like pulseaudio might have a negative niceness value. You don't want your audio skipping when you watching your movies.

The other part of the system that needs to be scheduled is access to IO, especially the hard drives. You can do this with the command 'ionice'. By default, programs are scheduled using the best effort scheduling algorithm, with a priority equal to (niceness + 20) / 5. This priority for the best effort is a value between 0 and 7. If you are running some program in the background and don't want it to interfere with your foreground programs, you can set the scheduling algorithm to 'idle' with
ionice -c 3 my_program
If you want to change the IO niceness for a program that is already running, you simply have to use the option '-p procid'. The highest possible priority is called realtime, and can be between 0 and 7. So if you have a process that needs to have first dibs on IO, you can run it with the command
ionice -c 1 -n 0 my_command
Just like the negative values for the nice command, using this realtime scheduling algorithm is only available to the root user. The best a regular user will be able to do is
ionice -c 2 -n 0 my_command
which is the best effort scheduling algorithm with a priority of 0.

Now that you know how to control how your programs use the resources on your machine, you can change how interactive your system feels.

Open Circuits

Open Circuits provides a site full of ideas and circuits put into the open source community. Using their description:

Open Circuits is a wiki for sharing open source electronics knowledge, schematics, board layouts, ports and parts libraries. This include open hardware Music Players, atomic microscopes, PC, PDA and mobile phones, and batteries.

Visit, and add what you can.

New Acer Iconia tablet

I just got a new Acer Iconia tablet, and I am in love. The only issue I had was connecting to the wireless network at UNB. I finally learned that I had to actually go through the add network option and add all of the details (including the SSID) manually. Once I did this, everything came up wonderfully. Ahhhh.