The Rational Coder: November 2011

Using sed to clean up files

You can use sed to delete particular lines, based on some unique criteria. This would look like

sed -e "/search criteria/d" -i2 file1.txt

This happens "in place", replacing the file's original contents. If you want to process a number of files in the same way, you can use

ls *.txt | xargs -i{} sed -e "/search criteria/d" -i2 {}

This takes the results from ls and hands it in to xargs. xargs then runs sed on each filename, one at a time.

Stupid afio Tricks

We've already looked at tar, and all of the wonderful ways that it can be used. But it is not the only tool at our disposal. Another tool that gets used quite a bit for doing backups is afio. Depending on your distribution, it may or may not be already installed. In Ubuntu, for example, you would have to run

sudo apt-get install afio

to get it installed on your system. Once you do, you have a fairly powerful tool at your disposal for archiving files and making backups.

By default, afio reads and writes the files being archived on standard input and standard output. This means that you can create your list of files to archive with another program, like find, and pipe it to afio to do the actual archive. Once you have your list of files, you can apply 5 basic commands to those files

-o create an archive
-i install (or unpack) an archive
-t test (or list) the files stored in an archive
-r verify the files stored in an archive against the file system
-p copy the files to a given directory location

If you want to create a simple archive of all of your C source code files, you would execute

find . -name *.c -print | afio -o -Z source_code

When you want to extract these files again, you would execute

afio -i -Z source_code

When you run afio as a regular user, all file paths are stored with no leading "/". This means that when you unpack an archive, it will unpack in the current directory. The idea is to hopefully avoid accidentally overwriting system files. To keep the leading "/", you need to use the command line option "-x". If you run afio as the superuser, then this behavior is reversed. Any leading "/" is maintained, and you need to use the command line option "-X" to get the usual behavior of stripping leading "/".

If space is at a premium, afio can also compress your archive, just like tar can. This is done by using the command line option "-Z". There is one very big difference, however. When you compress a tar archive, the entire archive file gets compressed. This means that if you have a corruption in one part of the file, you could potentially lose all of the files in the archive. When you compress an afio archive, the archived files are actually compressed individually. This means that if one file becomes corrupted, by whatever means, you won't actually lose any of the other files in the archive. When you do compress an archive, afio uses gzip by default. You can tell gzip what compression factor to use with the command line option "-G num", where num is the amount of compression gzip is to apply to the archived files. This is a number between 0 (for no compression) and 9 (for maximum compression), with a default of 6. You may need to balance how much CPU and how much IO time is being used during the compression phase. If so, you can put limits on when compression is to be used. The command line option "-T threshold" tells afio not to try and compress a file unless it is at least threshold bytes in size. The default setting is "-T 0k", so afio tries to compress all files, no matter how small. At the other end of the spectrum, you may want to limit how large a file can be before afio tries to compress it. You can do this with the command line option "-2 max", where max is the maximum file size. The default in this case is "-2 200m", so afio won't try and compress files larger than 200MB.

What if you don't want to use gzip as your compression method? You can change this by using the command line option "-P progname", where progname is the name of the executable to use to do the compression. If you need to hand options in to this alternate program, you can do this with the option "-Q opt". You need to use separate "-Q" options for each option you need to hand in to the alternate program. Because afio simply executes this alternate program, you can run anything at this stage. This could include an encryption program, allowing you to encrypt your archive. To encrypt your archive using PGP, you could execute

export PGPPASSFD=3
find . -name *.c -print | afio -ovz -Z -U -P pgp -Q -fc -Q +verbose=0 -3 3 archive 3
This would run PGP on each file in the archive as they are added.

The last small trick with afio is that you also have the ability to interact with archives on external systems. The way you do this is similar to how you do it with tar. The format looks like

[user@]host[%rsh][=afio]:file

The option "user@" is the user name you would use to access the external system. The default communications mechanism is rsh, but you could change that to ssh by using the option "%ssh". You can define the command to use on the external system by using the option "=afio". You can use this if the executable is named something else, or in an odd location. So, if you wanted to archive all of your source code files onto an external server over ssh, you could execute

find . -name *.c -print | afio -o -Z user@server%ssh:archive

Using afio, you can now go forth and ensure that you have proper backups of all of your important information. So now you don't have any excuses anymore.

Controlling Your Processes

I believe that it was the bard who said

All the CPU's a stage,
And all the processes and threads merely players;

or something like that. In any case, it is true. All of the processes that you want to run on your machine are like players, and you are the director. You control when they run and how they run. But, how can you do this? Well, let us look at the possibilities.

The first step is to run the executable. Normally, when you run a program, all of the input and output is connected to the console. So you see the output from the program and can type in input at the keyboard. If you add an '&' to the end of the program, this connection to the console is severed. Your program will now run in the background and you can continue working on the command line. When you run an executable the shell actually creates a child process and runs your executable in that structure. But sometimes, you don't want to do that. Let's say you have decided that no shell out there is good enough and so you have decided to write your own. When you're doing testing, you want to run it as your shell, but you probably don't want to have it as your login shell until all of the bugs have been hammered out. You can run your new shell from the command line with the 'exec' function
exec myshell
This tells the shell to actually replace itself with your new shell program. To your new shell, it will look like it is your login shell. Very cool. You can use this to also load menu programs in restricted systems. That way, if your users kill off the menu program they will get logged out, just like killing off your login shell. This might be useful in some cases.

So, now that your program is running, what can we do with it? If you need to pause your program temporarily (you may need to look up some other information, or run some other program), you can do this by typing ctrl-z (Control and z at the same time). This pauses your program and places it in the background. You can do this over and over again, collecting a list of paused and backgrounded jobs. To find out what jobs are sitting in the background, you can use the shell function 'jobs'. This will print out a list of all background jobs, with output looking like

[1]+ Stopped man bash

If you wanted to also get the process IDs for these jobs, you can use the option '-l'

[1]+ 26711 Stopped man bash

By default, jobs will give you both paused and running background processes. If you only want to see the paused jobs, use the option '-s'. If you only want to see the running background jobs, use the option '-r'. Once you finished your sidebar of work, how do you get back to your paused and backgrounded program? The shell has a function called 'fg' which lets you put a program back into the foreground. If you simply execute 'fg', the last process backgrounded is pulled back into the foreground. If you want to pick a particular job to foreground, you would use the '%' option. So if you wanted to foreground job number 1, you would execute 'fg %1'. What if you wanted your backgrounded jobs to continue working? When you use ctrl-z to put a job in the background, it is also paused. To get it to continue running in the background you can use the shell function 'bg'. This is equivalent to having run your program with a '&' at the end of it. It will stay disconnected from the console but continue running while in the background.

Once a program is backgrounded and continues running, is there any way to communicate with it? Yes there is: the signal system. You can send signals to your program with the command 'kill procid', where procid is the process ID of the program you are sending the signal to. Your program can be written to intercept these signals and do things, depending on what signals have been sent. You can send a signal by either giving the signal number or a symbolic number. Some of the signals available are

1 SIGHUP terminal line hangup
3 SIGQUIT quit program
9 SIGKILL kill program
15 SIGTERM software termination signal
30 SIGUSR1 user defined signal 1
31 SIGUSR2 user defined signal 2

If you simply execute kill, the default signal sent is a SIGTERM. This signal tells the program to shutdown, as if you had quit the program. Sometimes, your program may not want to quit. You sometimes have programs that simply will not go away. In these cases you can use 'kill -9 procid', or 'kill -s SIGKILL procid', to send a kill signal. This will usually kill the offending process with extreme prejudice.

Now that you can control when and where you program runs, what's next? You may want to control the use of resources by your program. The shell has a function called 'ulimit' which can be used to do this. This function changes the limits on certain resources available to the shell, as well as any programs started from the shell. The command 'ulimit -a' will print out all of the resources and their current limits. The resource limits that you can change will depend on your particular system. As an example, which crops up when trying to run larger Java programs, let's say you need to increase the stack size for your program to 10000KB. You would do this with the command 'ulimit -s 10000'. You can also set limits for other resources like the amount of CPU time in seconds (-t), maximum amount of virtual memory in KB (-v), or the maximum size of a core file in 512-byte blocks (-c).

The last resource that you may want to control is what proportion of the system your program uses. By default, all of your programs will be treated equivalently when it comes to deciding how often you programs get scheduled to run on the CPU. You can change this with the command 'nice'. A regular user can use nice to alter the priority of their program down from 0 to 19. So, if you are going to run some process in the background, but you don't want it to interfere with what you are running in the foreground, you can run it by executing
nice -n 10 my_program
and this will run your program with a priority of 10, rather than the default of 0. You can also change the priority of an already running process with the program 'renice'. If you have a background process that seems to be taking a lot of your CPU, you can change it with
renice -n 19 -p 27666
This will lower the priority of process 27666 all the way down to 19. Regular users can only use nice or renice to lower the priority of processes. The root user can increase the priority, all the way up to -20. This is handy when you have processes that really need as much CPU time as possible. If you look at the output from top, you can see that something like pulseaudio might have a negative niceness value. You don't want your audio skipping when you watching your movies.

The other part of the system that needs to be scheduled is access to IO, especially the hard drives. You can do this with the command 'ionice'. By default, programs are scheduled using the best effort scheduling algorithm, with a priority equal to (niceness + 20) / 5. This priority for the best effort is a value between 0 and 7. If you are running some program in the background and don't want it to interfere with your foreground programs, you can set the scheduling algorithm to 'idle' with
ionice -c 3 my_program
If you want to change the IO niceness for a program that is already running, you simply have to use the option '-p procid'. The highest possible priority is called realtime, and can be between 0 and 7. So if you have a process that needs to have first dibs on IO, you can run it with the command
ionice -c 1 -n 0 my_command
Just like the negative values for the nice command, using this realtime scheduling algorithm is only available to the root user. The best a regular user will be able to do is
ionice -c 2 -n 0 my_command
which is the best effort scheduling algorithm with a priority of 0.

Now that you know how to control how your programs use the resources on your machine, you can change how interactive your system feels.

Open Circuits

Open Circuits provides a site full of ideas and circuits put into the open source community. Using their description:

Open Circuits is a wiki for sharing open source electronics knowledge, schematics, board layouts, ports and parts libraries. This include open hardware Music Players, atomic microscopes, PC, PDA and mobile phones, and batteries.

Visit, and add what you can.

New Acer Iconia tablet

I just got a new Acer Iconia tablet, and I am in love. The only issue I had was connecting to the wireless network at UNB. I finally learned that I had to actually go through the add network option and add all of the details (including the SSID) manually. Once I did this, everything came up wonderfully. Ahhhh.