Thursday, May 6, 2010

CVS: Using Perl to Restrict Commits

This is one of the most useful tricks I have used in terms of SCM controlling of a CVS repository, so listen closely.

CVS is a great tool, and a horrible tool all in once Open Source package. It can do a lot, but often it can do too much as well. In order to stand a chance of having a well controlled environment, it is often important to have CVS do some of the enforcement work for you. Let it be the bad guy instead of you.

First, check out the CVSROOT module from CVS:
$ cvs co CVSROOT
The most important file here is verifymsg. This file tells CVS which files to pay attention to when someone commits a file. In this file, we will tell CVS to restrict one file to a set of rules, and let another file slide no matter what. Take for example 2 subdirectories, one called GUI and one called TEST. Let's say we don't care about TEST code and can let developers change this stuff at will, but GUI is our production code that we want to keep tight control on. How do we control it?

We can either require them to submit some sort of change document like a bug id, problem report (PR), software change request (SCR), etc. OR we can put certain requirements on the content of the commit comment. The method I employed was to tie them to using SCR's that they had to create with a 3rd party tool called Serena ChangeMan Dimensions. It is a tool for Change Tracking (a critical SCM concept and we were able to do it by doing this hacked integration of CVS and Dimensions. A tighter, more seamless integration already exists with Bugzilla and CVS, but we are not always given flexibility to use the tools we want to use, so we figure out how to use the tools we have to use.

In any case, in order to require them to create an SCR in Dimesnions for each check in, we will need to do a few steps:
1)    Tell CVS which files to NOT-enforce on in verifymsg
2)    Write a perl script to check the commit message for an SCR number. This perl script is the key--if we wanted to make sure they wrote a full sentence in their message instead of checking for an SCR number, we could do that here instead too.
3)    Tell CVS which files we DO want to enforce checking on and point it to the perl script created above to verify the commit message.
4)    Commit the verifymsg file to CVS
5)    Ensure the script is executable by everyone.
Now to work on the perl script. A good place for it would be /usr/local/bin, but you can put it anywhere you want. This is going to be a perl script. Don't know what Perl is? It is not something precious made in clams and strung around a pretty girl's neck. It is a powerful scripting language. That is all I will say about it here. If you want to know details, visit http://perldoc.perl.org/index-tutorials.html or http://www.perl.com. My goal here is to just show the shortcuts to get the job done, without delving into the guts of any of this.

So we'll start with an empty file, call it verify-scr.pl, and place the following in the very first line of the file to declare it as a perl file:
#!/usr/bin/perl -w
Next we will place a BEGIN command to tell perl that this block of code should be run as soon as possible (even before the rest gets compiled). In this block we will set $valid=0, and at the end of the file, we will return this variable. This is what our code looks like so far:
#/usr/bin/perl -w
BEGIN
{
  $valid = 0;
}
The purpose of this block is so that, even if there is a problem interpreting the rest of the perl script, we can still give something back for CVS to understand. Namely, we will immediately set the check to fail, so as a default if the script fails right now, the whole CVS commit will be denied. This is a safety precaution since we don't want developers to commit code that doesn't meet our requirements just because there is a typo in the perl script.

On to the meat of the perl script. Each line is being interpreted individually in this script. What we are analyzing is the text content of the CVS commit comment. We will encapsulate the check within a while loop and want to read the input from STDIN (which is the CVS commit comment that is being sent in). Type while (<>), then we will put the rest in curly braces. The next part is where perl shines--using a regular expression to find the SCR number and capturing it to a variable. In this example, I required users to type "SCR: " followed by the SCR number. The perl script will find that token SCR: and then look at the SCR id right after it.
if (/^SCR:\s*([A-Za-z0-9_]+[SCR|scr][_0-9]+)/)
Goodness that looks complicated, and I wont lie to you, it is! But let's simplify it. Obviously the if statement is just a simple common programming construct, so we'll skip it. The ^ indicates that we are looking for something at the beginning of the line. Without the ^ it will try to match the SCR: anywhere in the line. This is preference, but I believe they should have this SCR number declared in the beginning of a line, not embedded in the middle of a sentence. Much cleaner that way. We already answered the next part...the SCR: after the ^ is just an exact text string we are looking for. The \s is vital magic here; it tells perl that everything within the that matches the regular expression within the proceeding parenthesis will be stored in predefined variable $1. Ex. if the CVS commit comment is "SCR: PROJECT_SCR_22", it will parse out only the PROJECT_SCR_22 into $1. Now we tell perl how to find that PROJECT_SCR_22 string. The [A-Za-z0-9_] means find any combination of UPPERCASE or lowercase letters and numbers, all followed by an underscore. PROJECT_ matches, as well as ProJect_, or even MAcH5_. The + that follows actually doesn't mean "plus the following" as intuition leads you to believe--it actually means "at least one or more of the previous".

I will diverge for just a slight moment. What we are using in the above "if" statement is what is called a regular expression and is found in much more than just perl. Often the syntax is very similar, or seemingly exact, but keep in mind we are still doing perl here.

We left off looking for any string that contains at least one UPPERCASE, lowercase, number, or underscore. In my implementation, I have multiple SCRs for multiple projects, so there is ABC_SCR_1, XYZ_SCR_1, and JoeSmith_123_SCR_1. All are valid SCRs, so our perl regex (short way of saying regular expression) says to look for basically anything, followed by the string "SCR" in either UPPERCASE or lowercase. I didn't allow mixed case in this example. So the next part, right after the first + we saw, is the [SCR|scr]. As you may have guessed by my previous statements, this means find the term SCR or scr. The pipe operator "|" means or in this case. Again, only all UPPERCASE or all lowercase is matched, not mixed case. Notice that I did not put a + sign after this square bracketed part. The reason is I don't want to find SCR "one or more times" nor do I want to find it "zero or more times" (which is what * would mean)--I want to find it one time and exactly one time...no more, no less. At this point, we now have some string indicating the project, then we have the SCR keyword that we have to find, and finally we'll look for some number. That is the [_0-9]+ part; we look for any number, but again it has a + sign so we have to least have SCR followed by some number. It could be 3, 10, 42, or even 2395678542.

Once we close the parentheses, this tells the "s" which was placed before the first parenthesis that the string we just matched will be placed into $1. Note, if you had placed multiple instances of s(some_regex), they would incrementally be matched and placed into $1, $2, $3, etc. Place another close parenthesis to close off the "if".

Now that the hardest part is done, we'll breeze through the rest. Create an open curly brace to say that we are going to put a block of code that will be executed if the if statement above succeeds. In here we will do a bunch of things at once:
{
   my $scr = $1;
   print "We matched $scr from your commit comment\n";
   $valid = 1;
}
First line puts the matched string into a variable called $scr. The print line is just a message we are printing to screen to let the user know what we are doing. Since we used $1 in the print line, it will show them the string we matched. The \n just passes in a new line. We could get fancy at this point and wait for a user input to verify if we matched the right thing...but not in this post. Finally, we set the $valid variable that we used in the BEGIN statement. Remember that we assumed it is immediately false back in the BEGIN, but now we are setting it to TRUE. So if we get to this line and set this variable, when we exit the script, the CVS commit will succeed.

Close up the Curly braces now for the if statement and while statement, then put an "exit !($valid)" and we are all done. The final product looks like:


#/usr/bin/perl -w
BEGIN
{
  $valid = 0;
}
while (<>)
{
 if (/^SCR:\s*([A-Za-z0-9_]+[SCR|scr][_0-9]+)/)
 {
   my $scr = $1;
   print "We matched $scr from your commit comment\n";
   $valid = 1;
 }
}
exit !($valid)
Our Perl script is done, so now to tell CVS when to use it. Edit the verifymsg file from within the CVSROOT module we checked out. At the end of the file, we are going to add another regex to catch commits and route them to the perl script. Going back to the example started at the beginning of this post, we need to put a line in this verifymsg file to tell CVS that all code in GUI needs to have an SCR, but all code in TEST does not. This is how:
^TEST     /bin/true
^GUI       /usr/local/bin/verify-scr.pl
Remember, we saved our perl script from above in the /usr/local/bin folder, so now we are calling it when the CVS commit matches the ^GUI line. The carot (^) is a regex notation for "the beginning of the line" so it is looking for GUI as the first word in the line. If GUI and TEST were part of a larger module, say CODE, then it would be ^CODE/GUI.

Security warning: The file /usr/local/bin/verify-scr.pl needs to be executable to all users (or at least those in the cvs group), but should not be editable to anyone. Make sure to change permissions accordingly.

Save and commit the verifymsg file. Keep in mind that CVS will match only the first instance in the verifymsg file, so if you have something that matches 2 regex lines in verifymsg, only the first one will be used.

No comments:

Post a Comment