Home · Fellowship Home · Grapher Downloads

SLU Fellowship Progress Log

Shelley Kandola, Summer 2011


At-a-Glance

Week 1: May 25 · June 1 · June 2 · June 3 · Week 2: June 6 · June 7 · June 8 · June 9 · June 10
Week 3: June 13 · June 14 · June 15 · June 16 · Week 4: June 20 · June 21 · June 22 · June 23 · June 24
Week 5: June 27 · June 28 · June 29 · June 30 · July 1 · Week 6: July 5 · July 6 · July 7 · July 8
Week 7: July 11 · July 12 · July 13 · July 14 · July 15 · Week 8: July 18 · July 19 · July 20 · July 21 · July 22
Week 9: July 25 · July 26 · July 28 · July 29 · Conference: Oct 7 · Oct 8
Bibliography · Current To-Do List

October 8th, 2011

If anyone's curious, we arrived safely in Minneapolis! It seems our hotel was booked for one less day than the conference lasts, so we'll see what happens there...

Top


October 7th, 2011

So, tomorrow morning I'm leaving for Minneapolis, MN to present my research at the GSA Annual Meeting & Exposition with Prof. Kratzmann and some additional Geology folk. For those of you who won't be able to attend the conference, you can view the poster I'm presenting here. Looking at what I planned to do as outlined in my last post, I feel like I've let the program down a bit. The graphs are still lacking labels, and the command-line argument is still just perl gpvplot.pl. I have however included the PDF::API2 module in the program download and the program no longer breaks if the user doesn't have any of the utilities required. In fact, in the case of evince (PDF viewer), if the user doesn't have it, the program won't display the PDF -- just save it to the current directory. Speaking of which, I should work on the filesave destination.

I'm leaving for the conference tomorrow morning, but I won't be presenting until Wed, Oct 12th. Once I'm done with the conference, my next (and final?) step of the Fellowship will be to write (and hopefully publish!) a short note article for Computers and Geosciences. I'm pretty nervous for the conference, actually. I'm afraid I don't know enough about geology to be presenting at a conference as big as this. The Minneapolis Convention Center has 480,000 ft2 of conference space!!! This is actually the first conference at which I've ever presented. Last year I went to HRUMC, but that was just for fun.

Top


July 29th, 2011

So today's my last day in Canton, NY until August 19th. It's been fun :)

To summarize some recent email correspondence, here are some more things I can do to improve my program and prepare for the conference in October:

  • Have the dataset names be written in the command line
    • Utilizes terminal's auto-complete
    • Example: perl gpvplot.pl dataset1 dataset2
  • Change the name from Grapher to GPVPlot and merge with the helper file
  • Put the PDF::API2 module in the .zip for download
  • Label the sets with a key

Top


July 28th, 2011

If it wasn't obvious, the Fellowshing is coming to a close this week. I've spent each day doing a variety of things to wrap up my research. For the past two days I've been working on putting together a poster (although I was also home sick and working from my laptop) and today I plan to start working on:

  1. The 500-word summary of my research that's due to SLU ~August 12th
  2. The 1500-word short note article for Computers and Geosciences

Top


July 26th, 2011

Today I'm working on my poster!

Top


July 25th, 2011

Today's goal undoubtedly is to a) finish the abstract for the geology conference and b) come up with a name for my program. Also, Professor Sharp recommended I read Don't Make Me Think by Steve Krug. It's about web usability, but I figure I could apply it to programming as well. Take 2 of the Abstract:

Geologists collect a plethora of geochemical data from a variety of analytical sources. In petrological and geochemical studies it is routine practice to plot these data on discrimination or "Harker" diagrams to facilitate the identification and interpretation of geochemical trends. While there exist many graphing programs, some of which intended specifically for geologists, none allows the user to quickly graph large or many data sets in such a way that the user can view all the data side by side on one page. VGPPlot is a ‪command-line based program that ‬will enable geologists and geochemists to‪ rapidly graph large quantities of geochemical data from multiple data sets. The program provides a fast and efficient method of sending large quantities of data to multiple graphs then finally saves and displays it as a PDF.

Through the computer’s terminal, the program prompts the user with a short series of questions and from that generates graphical output as a PDF. The user can plot up to eight oxides from as many ten data sets against a single x-axis variable. Once the user has specified the names of the data sets, the variables, and the range of data to plot (optional), the entire process from graphing the data to saving the graphs as images to embedding them on and displaying a PDF takes between 4 and 6 seconds. The program input is flexible and will accept filenames with or without extensions. VGPPlot is designed to take .CSV files (as generated by a program such as Excel™) and .TXT files (of whitespace-separated values). The program is very low-impact, consisting of only two files requiring ~11 KB of storage space. It was written in the open-source languages Perl and gnuplot to maximize availability and customization. Because Perl is a Unix-based language, this program is optimized for use on Unix-based machines such as those running Linux and Mac OS X. The result is a compact, efficient, open-source program that rapidly graphs large volumes of geochemical data.

So... something about having a plus sign in one the variables is breaking the string index. When I search for "Na2O+K2O" in an array of strings, I get infinitely: Use of uninitialized value in pattern match (m//) at matching.pl line 13. Line 13 is this: $strindex++ until $_[1][$strindex] =~ $_[0] or $strindex>$_[1]; What I conclude from that is that =~ doesn't work with non-letters? Definitely. This turns up false: print $want =~ "Na2O+K2O"; There I have two strings that are exactly the same. One as a scalar, one as a raw string. Nonetheless, it's false. Even worse, this comes out false!

    print "Na2O+K2O" =~ "Na2O+K2O";

So frustrating. Meanwhile, here are improvements I can make in the long run:

  • Fill up whitespace between graphs without labels
  • Fix &stringindex to recognize "+"
  • (Auto-complete file names)
  • Rotate one graph/page

This is incredibly trivial, but awesome. I wrote a line of code that will print "second" or "seconds" depending on how long the program took. Speaking of which, I got rid of Time::HiRes, so the time is less accurate/in seconds (not microseconds).

    printf "Time elapsed: %d second%s\n", $time_taken, $time_taken == 1 ? "":"s";

Top


July 22nd, 2011

I've spent most of today thinking up names for my program, reviewing my abstract, and working on a project I have to do for my other job at IT. What I'm looking for in a program name is something that's easy to say and captures a few of the key features of my program. My mentor suggested something along the lines of "FastPlot" or "EasyAnalyser" which got me thinking about how my program is a tool for analyzing trends in geochemical data. Since my program isn't exclusively for geologists though, I figured I'd leave something to do with that out of the name. So what about "QuickAnlys"? (Anlys as the abbreviation for analysis.) It rolls of the tongue (think: quick'n lis) and highlights the speed and purpose of the program. Ah well, it's just a thought. Plus: it has absolutely no Google hits :)

Courtesy of my mentor:

"Perl – The only language that looks the same before and after RSA encryption."
-Keith Bostic

Another little project I've been working on is writing the perfect readme.

README.txt

Top


July 21st, 2011

Today I'm going to work on writing an abstract for a geology conference so I can present my program.

Idea: Since I have a lot of extra room now that I've removed the x-tics and x-labels, I might include margins in the HoHoA so the graphs can better fill their space. Also, I found this about autocomplete in Perl, but it would mean requiring another module.

Potential abstract draft (also, I'm open to suggestions for a name for the program):

Grapher is a command-line based program that rapidly graphs large quantities of geochemical data. A practical use of this program is for plotting major element matrix glass data against a compound that can be used to measure the degree of differentiation in a rock, such as silicon dioxide (SiO2). The user can plot up to eight compounds from as many ten data sets against a single x-axis. To facilitate visualizing trends in the rock data, Grapher will scale and rotate the graphs as necessary such that they all fit on one page. Once the user has specified the names of the data sets, the variables, and the (optional) range of data to plot, the entire process from graphing the data to saving the graphs as images to embedding them on a PDF and opening the PDF for viewing takes between 4 and 6 seconds. The program input is flexible and take filenames with or without extensions. The program was originally designed to take CSV files as generated by a program such as Excel, but it will also take TXT files of whitespace-separated values. The program is very low-impact as well, consisting of only two files totalling 11 KB of storage space. All images and reformatted .txt/.csv files made during the execution of the program are stored in an invisible temp folder which is then deleted upon completion of the program. Because Perl is a Unix-based language, this program is optimized for use on Unix-based machines such as those running Linux and OS X. The result is a compact, efficient program that rapidly graphs large volumes of geochemical data.

Top


July 20th, 2011

Two people tested my program yesterday: one on OSX 10.6 and one on Ubuntu 10.04. Both ran into the same problem: missing packages. I think I found five things that didn't come preinstalled on one of the machines that was tried. Here's a list of requirements.

  • General
    • Perl 5.10.1
    • gnuplot
  • Perl Packages
    • PDF::API2
    • Time::HiRes (not critical; just for measuring efficiency)
  • Terminal
    • convert
    • evince (not critical; displays PDF automatically after creation)

To my project this adds the need to see if all the above things exist, and if they don't, install them. Also, I think I fixed the zipping problem: I can't have periods in the filename (should've been obvious, hah). I've just updated the program to take filenames with extensions as well. I think I've figured out how to test for the Perl modules I need, but I don't know how to test if the terminal packages evince and convert are installed.

Here's something peculiar I've noticed. According to PDF::API2, a PDF is 612 x 792 units. When I opened a PDF in Gimp (Image Editor), it says the dimensions are 850 x 1100, which I'm assuming is in pixels. So there are a few things that could mean:

  • I'm making each image smaller than it needs to be and there's a lot of whitespace
  • Something is automaticall converting from whatever units to pixels
  • The images are being stretched/shrunk somewhere

Top


July 19th, 2011

Well this has been a successful start to the morning! After a few minor changes, I modified the subroutine &csvarray to take an array of setnames instead of just one. That ensures that no matter how disjoint the datasets are, all the data will be graphed!

Something else that turned out to be much easier than expected: removing the xtics from all but the bottom graphs. I was worried at first because the images are mapped using a foreach statement, which usually doesn't follow any order. What I didn't take into consideration, however, was that although the images are in an orderless hash, each hash entry still maps to the HoHoA that maps to specific locations and indeces. All I need was a quick little if-statement.

    # Erasing the x-axes
    my $unset_xaxis = "set xtics nomirror rotate by 90";
    if($imgindex < ($num_imgs-2)){
	$unset_xaxis = "unset xtics";
    }

So basically, if the image is not one of the last two of the set, erase the xtics. Tada! The only minor kink is for 3-4 graphs/page. Those are mapped sideways. Well, that was an easy fix. Just another if-statement specific to having exactly 3 or 4 images to embed. This is a minor kink as well, but now that all but the bottom two graphs don't have x-tics, there's more room. Should I make the images bigger?

According to my todo list, all that's left is to fix the zipping and to make the program take a filepath instead of just a filename. I've updated the download page to have the new version, 0.82, and a screenshot of a more recent interaction experience with the program. Here's a known bug though: my stringindex subroutine can't seem to process Na2O+K2O. I'll be sure to work on that this week.

Top


July 18th, 2011

I had an idea as to why the capturing wasn't working when I was trying to swap out all the single spaces en masse: it was within backtick operators. Right now I'm going to try putting that into a string, and then into the syscall. Hm. When I do that, it tries to do the swap before I've given it the string/file to do the swap on, giving me the "use of uninitialized value $_" error. Ugh.

Well, I wrote a subroutine to correctly swap out the spaces for dashes in the .txt file. I thought that would solve the problem with the number ranges, but it hasn't. The weird thing is, it only includes the first number of the next column sometimes. Hm. I looked at the range of values for scaling the x-axis for each variable in the graph, and found that it was completely random which values from the next column it chose. I'm thinking it might have something to do with the length of the row name. Or maybe I'm passing in the wrong file to be read from? I'm almost positive now it has something to do with the spacing. Before, I was splitting the file along \t, but some row names were longer than others, therefore the space between columns might be 3 or 4 spaces instead of an actual tab character. So I changed it to split along \s (whitespace), but I'm still not getting the correct ranges for each variables.

Interesting... I wrote a subroutine that would print each element in an array surrounded by [], so I could determine exactly what the program thinks is in the array. When I was splitting by \s, I was getting array values of [] [] [] every time there were three spaces instead of a tab. I thought that splitting a string by whitespace would have taken all consecutive chunks of whitespace and treated it as one; I didn't think it would differentiate between spaces and tabs... well, now I know. But how to fix it? When editing the TXT file, I need to make sure that there is a tab between each row title and the first value of each row. Aha! Here's an idea: what if I do a swap like s/ +/\t/g that replaces any sequence of spaces with a tab? Let's give it a go!

Fascinating! You can destroy an array by telling it that its length is -1. Anyway, I'm getting much closer to figuring it out, but there are still stray spaces here and there that throw off the alignment of the columns. Well, I finally got it, but I'm not happy about my solution. I use a lot of temp arrays to sort things out, and I'm afraid of what that will do to the efficiency.

Here's something I hadn't considered: sometimes the range of x-values between files are disjoint. This means that gnuplot only plots the x-range from the first data set, leaving most of the other values unseen. See how the red and green data sets rarely overlap both horizontally and vertically? I'm going to need to update my subroutine to take @setnames as an argument instead of just $set1. That way, I can get an average across all the data sets being used.

Notice that where there is an overlap, as with MnO, the y-range is nicely spaced! No more cramped numbers on small graphs :)

Tomorrow's goal: rewrite the subroutine to take @setnames

Top


July 15th, 2011

Today I decided to do all my programming from home, and I've learned some interesting things. First, my installation of Ubuntu (11.04, Natty Narwhal) doesn't come with gnuplot installed. What I think I'm going to add then is a bit of code that checks to see if gnuplot is installed. If not, it will prompt the user to install it, and possibly give them the sudo instruction to do so. Another thing I noticed is that when the xtics are rotated 90 degrees in gnuplot, they're rotated about the center as opposed to the left-most side, so the x-tics end up overlapping with the x-axis. I'm not going to fix that for now, because I'd like to see how it runs on different computers. On the plus side, my computer seems to be graphing things way faster then bewkes144-10. The last plot I did had two data sets and 7 variables to graph, and the timer says it took 6320 microseconds. I almost don't believe that.

I think I've got the y-tic problem mostly solved though! Here's the bit of code that did the trick:

    # Determining tic spacing
    my @ticarray = &csvarray($set1, $index-1);
    my $range = sprintf("%.3f", ((&max(@ticarray) - &min(@ticarray))/4));
    .
    .
    .
    set ytics $range
					

The only slight problem there is that it makes the range based on the data from the first data set. Let's say the range of values in the first data set for a particular variable is 4-8. The code above will put a tic every one (1) unit. Let's say the range of the second data set is 8-12 though. Because gnuplot plots all of the data points, it will have the range of the y-axis go up to 12, and it will still have tics every one unit. Then you end up with a graph that's physically still the same size as the first, but it will look much more cramped because the y-axis will have 8 tics instead of 4.

Actually, there's something I've noticed that's a bit of a bug. Whenever I say I want to scale the x-axis, the low value it gives me is correct, but the high value is always the first value of the column to the right. That doesn't make much sense, but it does explain some of the irregularities in the y-tics.

csvarray.pl

Top


July 14th, 2011

Yesterday I had an idea for making the x-tics look less cramped on the smaller graphs. Why couldn't make the images at full size in gnuplot, and then resize them with the command line? There's a simple convert -resize that does it (I just have to work out the arguments it takes). Alright so it takes the argument in the form of convert input.jpg -resize widthxheight output.jpg. So naturally, I put $width in, but the backtick operator in Perl reads that variable as $widthx. So somehow I need to fix that... Well, I figured it out, but unfortunately, the quality disappears, however nice the layout looks.

Layout vs. Readability

That's really a shame too, since the layout looks so nice! For comparison, here's the exact same data with the resizing done in gnuplot. Gnuplot adds so much padding to the graphs, they look very squished and unreadable. I wonder if the terminal command convert can use adjust the quality of images? Eh, I'm a little angry. The convert -quality option doesn't seem to do anything. I really love the layout of the terminal-resized images: they fill up more space, the tics are more cleanly separated, the data points don't overwhelm eachother... If only the quality were better; everything else is perfect!

Well, I've discovered one thing: I can change the margins of the graph. I think it'll end up being pretty helpful. What I'm envisioning now is adding another piece of data to the HoHoA that controls the margins. With some tedious rearranging I think I've shifted the labels and increased the margins such that I have the biggest graph possible for each size while still using gnuplot to size the images. What's left is making the tics look less cramped.

The hard part about controlling the spacing of the tics is that everything I've found that could help control the spacing is a constant, not a variable. I could specify things like set a tic every 2 units, but a lot of the data sets don't span more than 2 units. I could have a tic ever .5, but some data ranges are smaller than that still. Ideally, I'd like to put tics say, every 25%. But gnuplot doesn't have an option for that. I could get the max and min of the data, divide it by 4, round it to the nearest hundredth, and use that as the spacing for the tics. Hm. I do already have a method that does that... I'll make it tomorrow's project.

Top


July 13th, 2011

So this adds a little more clutter in the background, but I've resolved yesterday's problem by copying the the files into the invisible .temp folder and then deleting them afterward. The program makes so many tiny formatting changes to the user's files that I thought it'd be best to leave the originals untouched.

Right now I'm trying to tackle displaying subscripts in the axis labels. The annoying bit there is that gnuplot does subscripts in the form of _#. So I'm going to have to insert an underscore in front of every [0-9] character before passing it into gnuplot. I run into the same problem here though as I did with replacing all the single spaces with dashes. Ideally, I'd do a swap like s/[0-9]/_[0-9]/g, but the right side of a swap only takes strings. Interesting... I wrote this bit of code to correct the spelling of my name, and it worked!

    my $name = "Shelly";
    $name =~ s/(l)(y)/$1e$2/;
    print $name;

Why didn't it work earlier? Hm. Apparently the underscores only work for titles, not labels. If I want to use subscripts in the labels... You might try using the LATEX terminal type and putting text like "\\alpha_{3}" or '\alpha_{3}' . If I set the terminal type to LaTeX though, then the type won't be PNG... Grr. Aha! I got the subscripts working. It was all in the order that I had the commands in the gnuplot script.

I had an idea while eating lunch today. So remember how I need to eventually get rid of all the x-axis labels except for the bottom two? Well, the axes are stored in a hash, which is never iterated through in a predictable or consistent way. So here's what I thought -- what if I embedded the images in such a way that they overlapped one another? That'd require a lot of math, but it'd be an easy way to cover up the axes, and it would free up some vertical space for the graphs. It's going to involve a lot of math though.

capture.pl

Top


July 12th, 2011

In an email Dr. Kratzmann sent me back in May, I found some .txt files I could use as sample data to test my program on. The thing is though, they have the carriage returns ^M instead of newlines. I'm thinking I'll have to rewrite a substantial bit of my program. I'm going to write it to take both CSVs and TXTs.

Wow, so I just learned an AWESOME trick in Perl. The internet calls it Perl Pie. The pattern is this:

    $ perl -p -i -e '<some replacement or regular expression>' <file or files>

And that's it. It's... so easy. Wow. I can use that one line of code to make sure the format of any file passed into the program is correct.

To figure out exactly how fast my program is, I started implementing the Time::HiRes module. I tested it with near-maximum conditions. I used two datasets that had 55 points each, and with those two data sets I generated 8 graphs. All in all, 880 data points to be graphed. It took 5.450448 seconds to graph 880 data points onto 8 PNGs and then embed all the PNGs into a PDF. I hope that's fast enough. If not, I've got quite a bit of efficiency things to work on...

Alright, I've got a pretty significant bug to work out. My program crashes if there are any spaces in the left-column of the file. It treats each space as a vertical column break, and then everything's all messed up. The hard part is going to be removing the spaces from exactly that first column. Here's a little snapshot to give you an idea of what I need to do.

    Descrip.	Na2O  	MgO   	P2O5  	SiO2  	Al2O3 	K2O   	CaO   	 ...
    HUD110-2 gl trans 4     4.050	5.318	0.952	49.827	12.842	0.939	...
    HUD110-2 gl tachy 3     4.369	4.773	0.807	50.471	13.861	1.080	...
    HUD110-2 trans 1a	3.711	5.024	1.086	50.935	13.569	1.552	8.142	...
    HUD110-2 trans 2a	3.832	4.823	0.989	51.151	14.708	1.403	8.170	...
    HUD110-2 gl trans 3b    3.783	5.216	0.880	51.197	13.582	1.224	...
    HUD110-2 gl trans-1	4.409	4.780	0.850	51.255	14.659	1.315	8.069	...

What it should look like (from another file):

    Descrip.	Na2O  	MgO   	P2O5  	SiO2  	Al2O3 	K2O   	CaO   	...
    HUD207B-transgl-1	5.931	2.112	0.467	61.255	16.129	2.294	4.337	...
    HUD207B-transgl-2	6.114	2.256	0.451	60.455	16.091	2.189	4.645	...
    HUD207B-transgl-3	6.249	1.998	0.438	61.741	16.077	2.497	3.974	...
    HUD207B-transgl-4	7.116	1.625	0.371	62.300	15.858	2.399	3.440	...
    HUD207B-transgl-5	6.582	1.714	0.435	62.693	16.241	2.387	3.397	...
    HUD207B-transgl-6	6.038	1.932	0.448	63.007	16.135	2.381	3.401	...

So I've got something that sort of works. I used the pattern s/ /-/g; and that replaced all spaces with -, but it also interpreted tabs as multiple spaces. I need to come up with a pattern that's like, "character-space-character" and turn it into "charachter-hypher-character". Possible? I hope so. I've come across this documentation article, which could pose a bit of trouble.

...because in PerlThink, the righthand side of an s/// is a double-quoted string. \1 in the usual double-quoted string means a control-A...

Well, I can get rid of the space and the two characters surrounding it... here's a thought: what if I make copies of the TXTs into a temp folder, and then delete them afterward. I'm already making so many changes to the original file, I might as well leave them with it intact and keep the modified copies hidden. My mentor was right in that $1 can be used for "capturing" parts of an expression, but since the swap function expects a string on the right side, it doesn't work.

Top


July 11th, 2011

Before I do anything with grapher.pl today, I'm going to finish writing a program that converts a CSV file into an array of arrays. Alright, I've got the CSV into an array of arrays, but indexing it is tricky. I'm trying to grab the nth element of each array within the array, but the way a foreach loop works, the variable used to hold the current element has to be a scalar. To google! Hmm... it'd be so convenient if there were an easy way to get a column out of a 2D array. Aha! I thought of a much easier way. What I'd been doing so far is reading each line of a file into an array. Then I looped through that array, and turned each string into an array, and I pushed that into an array of arrays. Then I was trying to loop through that array of arrays to get a specific index from each row. The silly thing is I've already accessed the elements I want! When I split the string into an array, I can access the nth element without the hassle of indexing a 2D array.

    foreach my $line(@lines){
	my @temp = split(/\t/, $line);
	push(@numbers, $temp[$n]);
    }					

Sooo much easier! Now to figure out how to get the max and min of the numbers array it creates, and then integrate that into grapher.pl.

The problem I'm encountering right now is that Perl tries too hard to make things work. I'm able to read a column into an array (such that every value from that column is in the array), but when I try to find the max and min values of the array, it treats the column header (a string) as a number! I'm thinking pattern matching will have to come into play.

Now that I've integrated that into Perl side of grapher.pl, I need to integrate it into the gnuplot side. I'm hoping there will be a way to restrict the x-axis with variables such that if the variables are null, gnuplot will control the x-axis and if the variables are not null, the x-axis will be restricted to those variables.

It seems that the grapher isn't working for some variables. Not sure why yet. I can say, however, that I can plot graphs with SiO2 as the x-axis with restricted x-tics and it works. Something's wrong with Al2O3. Hmm... now it looks like the problem is with graphing multiple data sets (in that I can only graph more than one set). I tried out my July 7th version of the program, and that couldn't graph just one data set either. Hm.

Ah, so I had two tiny bugs. First, I was misssing a bracket somewhere (how komodo didn't pick that up, I don't know). Second, when I was trying to user a pattern to look for decimal numbers, I was using . which means all characters; \. refers to a period/decimal point. I think I can finally cross that off my todo list!

Top


July 8th, 2011

What I'm working on now is a bit complex. I've spent the whole morning so far figuring out how to read every piece of data from a CSV into a two-dimensional array. What I plan to do from there is use it to find the max and min of each variable's values, allowing the user to see them and then scale the x-axis if he/she so chooses. It's more tedious than difficult, I suppose; there's a lot of indexing.

Top


July 7th, 2011

My next step is actually going to be pretty challenging. I need to modify the program to allow it to graph multiple data sets on top of each other. It's not so much that that's hard to do, but storing and organizing all the data involved will be fairly difficult. I'm going to have to wrap everything I have so far in a giant data structure, I'm imagining.

For simplicity's sake, I'm going to assume that each possible data set will have the same variables in the same order (for now). That takes care of the data organization. What's next is modifying the gnuplot script to plot multiple data sets. It's very easy to plot multiple data sets using only one line of code in gnuplot, but since it's only one line of code, it's hard to keep that ambiguous. I need to figure out how to pass in a different line for that plot command depending on how many data sets there are to graph. What I'm hoping I don't have to do is something super specific for each possible number of data sets, in pseudo code:

    if num_sets==1:
	plot set1 x:y
    if num_sets==2:
	plot set1 x:y, set2 x:y
    etc.

At the moment, I can't see any clean way to do that. I just made a test program to see if I could break the plot command onto multiple lines, but it only graphs the last line. ... Alright, here's an idea: when plotting multiple sets, each command is separated by a comma. I could start with plot set1 x:y, but I could add strings to the end that start with commas. An example of a string I might add would be , set2 x:y such that the comma keeps them separated as if it were part of the original plot command. I really like this idea; I hope it works! I'm just uncertain about mixing the two languages to do it. If I'm not careful, gnuplot will think the string on the end is a variable. I have to create the string entirely in Perl first, then pass it to gnuplot.

YES! It's about time :)

Yields...

Excuse the resolution, PDFs don't convert to PNGs very well. Here's the snippet that ended up doing the magic:

    my $plot = "plot ";
    $plot .= "\"$set1.txt\" u $xindex:$index ti \"Set 0\"";
    for(my $i=1; $i<scalar(@setnames); $i++){
	my $temp = $setnames[$i];
	$plot .= ", \"$temp.txt\" u $xindex:$index ti \"Set $i\"";

And that should work no matter how many data sets the user is graphing!

Top


July 6th, 2011

Today I meet with Dr. Kratzmann to show him the progress I've made so far! Speaking of what I've done so far, I'd like to briefly go over how I come up with my version numbers, since there doesn't seem to be a standard anywhere on the internet. The current format, #.##, represents how many times it's been "officially" approved, the week number of my Fellowship, and then the day number of that week. The current version is 0.63 because I haven't shown my professors/mentors yet, and it's day 3 of week 6.

Alright, so I've just had my meeting with both of my mentors. As a result, I have a to-do list for the next 1.5-2 weeks:

  • Primary Objectives
    • Plotting multiple data sets on one graph
      • Have axes come from different files
      • Same variable, different data set
      • Looks at a range of rocks
    • Shrinking x-axis (Gnuplot, p. 115)
    • Font size for 5+ graphs/page
  • Secondary Objectives (mostly aesthetics)
    • Displaying x-tics only on bottom graph of each page
    • Fix zipping
    • Use subscript, i.e., Al2O3 instead of Al2O3
    • Constant amount of significant figures on axes
    • Add email request link to downloads page
    • Allow filename to take path names as well

Also, behold my dabbling in PHP!

Top


July 5th, 2011

Over the holiday weekend (Happy 4th of July!) I did some research on "contains" algorithms that I could use to fix my code. The one that looks the best so far is mapping the array to a hash, which has a really easy exists subroutine that runs in constant time. Time to implement it!

So I started out writing some test code that worked really well. I wrote a short program that has a list of names stored in it and asks the user for a name. The program keeps prompting the user until it gets a name that exists in the list. Worked fine. For some reason, it's not working in my graphing program. It says that every variable entered is not found. I'm thinking it has something to do with the formatting of the array? SUCCESS! It was in the formatting! When I split the first line of meltdata.txt (a string) into an array, I split using /\t/, which only splits along tab separations in the string. I could use \s instead (which splits along any whitespace), or just leave the field blank (I blame the forum that taught me how to split strings into arrays). Anyway, it's working for the first prompt; time to apply it to the others in the program!

Well now that that's finally working I can continue error-catching.

exists.pl · Grapher v0.62 Download

Top


July 1st, 2011

Now that I've got my &contains subroutine working, I'm going to try to make it more efficient. It shouldn't take more than half a second to search an array of 10 items, let alone 5 seconds. Something's seriously wrong there, but I can't quite figure it out. I feel like it's searching every possible index (even the ones that don't exist) until it gets a stack overflow or something like that.

For now I'm using the stringindex subroutine and returning &stringindex($_[0], $_[1]) < scalar($_[1]);, which basically finds the index of the string in the array, and if the index is between 0 and the length of the array, then return true. It works instantly for strings that are in the array, but it takes (I timed it) 12 seconds to figure out that a string isn't in the array. No idea what's going on there. I also got an error from gnuplot that I've never seen before: Too many ticks requested.

So this is interesting. I couldn't figure out what was causing that error, so I reverted to the Jun 28 version of my code, which I know was working perfectly at the time. I did have a little computer trouble today though... when I logged on, my csfile2 folder was completely empty, so rather than restart the machine, I SSH-ed into my own computer. Maybe that has something to do with it? Turns out it does. No idea why. I'll upload the terminal text to show just how weird it is. I run a program under SSH, it fails. I log out, run the same program from the same directory, it works fine. Weird.

Now that I've reverted, I'm going to have to make all the little improvements again... Modifications:

  • Check for directories and files before creating/deleting them
  • Storing images in .temp instead of temp to keep them invisible

Hm, I'm having the same problem. For some reason, it's not finding any variable in the array. The approach I've taken most recently is a foreach statement that breaks when it finds the string using last. Ah. Well, hm. Using some print statements, I found out that the array being passed into the subroutine is actually empty. I'm pretty sure it has something to do with references. Well, I got the references squared away, but now the logic seems to be a bit off.


    while($i < scalar(@array)){
        print "Checking: $array[$i]";
        if($array[$i] eq $want){
            $found = 1;
            exit;
        }
        $i++;
    }
    return $found;

The print statement proves that the subroutine is indeed checking every element in the array, and yet I see Checking: FeO followed by FeO not found. Something is wrong with how I'm returning 1 and how/when I'm breaking out of the loop. (Also, $found is initialized at 0.)

Top


June 30th, 2011

Here's something pretty cool I've spent my morning doing: ssh-ing into the lab computer I'm working on from my laptop. Unfortunately, since it's from my laptop it's really slow, but this kind of gives me 3 monitors and two keyboards to work from, all for the same computer! Since I used ssh -X, I can actually open windows on my Bewkes machine and send them to my laptop! Also, so I don't forget, the exact command I used was ssh -X sbkand09@bewkes144-10.

Moving on, today's goal is to prevent the user from entering a wrong variable name. I noticed a huge bug in the version I put on the downloads page: it doesn't recognize any of the variables. Whoops. That wasn't too hard; I just had my if's, unless's and else's mixed around. Here's a weird error I got though: I tried to graph H2O, which isn't on the list, and after the program ran for a few moments, I got the gnuplot error plot "meltdata.txt" u 5:23097162 ti "H2O" ... Skipping data file with no valid points. Why the program thought meltdata.txt had more than 23 million columns, I don't know. As cool as the smart match operator ( ~~ ) is, I might just have to write my own contains subroutine. Alright, so I've got my contains method working, but it's taking way longer than it should to search a list of 10 items. It's almost exactly the same as my stringindex subroutine.


    sub contains{
	$_[0] =~ s/\s//;
	my $strindex = 0;
	$strindex++ until $_[1][$strindex] =~ $_[0] or $strindex>$_[1];
	if(0<=$strindex and $strindex<=$_[1]){
	    return 1;
	}else{
	    return 0;
	}
    }

I don't understand why it's taking so long if I have limits on how far it can go before it stops and returns 0.

Top


June 29th, 2011

Today I'm determined to figure out that while loop. If not for the program, then for a better understanding of the Perl language. I think I'll start by making a slightly more complex average program that will average as many numbers as the user enters until ctrl + d is hit.

Wow, that was a pain. As is custom with Perl, there's more than one way to do everything. Here's the while-loop that finally got things to print the right way:


    # get the numbers
    print "Next: ";
    chomp;
    push(@numbers, $_);
    while(<>){
        print "Next: ";
        chomp;
        push(@numbers, $_);
    }

I'll try to implement that in my program now. Also, on a side note, I've been thinking a lot about how I'm going to distribute this program once it's complete. Should I just give everyone all the files? Or should I package them somehow in something like an .exe? Anyway, here's another comic about Perl.

So I've encountered something interesting! I got the while loop to require pressing Enter only once, but when I press ctrl + d to exit the loop, it adds the blank like at the end of the most recent prompt to the array, and then throws an error trying to graph it. It seems that maybe I was better off having A) to press Enter twice or B) to tell the program how many variables to ask for. I guess now it's a matter of deciding which is better. At least I know how to prompt the user one line at a time now! I don't quite understand why though... it looks like it has something to do with using the input ($_) without assigning it to a scalar first. Another difference I noticed is that the example that works takes <> instead of <STDIN>, although I thought those were the exact same thing.

What I've been trying to do now is make use of the redo statement to prevent the program from crashing if the user accidentally enters a variable that isn't in the data set. So far all that's been doing though is redoing the block that informs the user that a variable doesn't exist, making every variable not exist. Here's the basic syntax:


    unless ($yaxis ~~ @header){
	print "$yaxis not found.\n";
	redo;
    }

Also, a download page is up!

limitless-average.pl

Top


June 28th, 2011

On my walk over here something occurred to me. This might be "cheating," but I was thinking that maybe I could get around the double-Enter problem by asking the user how many variables they would like to graph, and then going into a for-loop however many times. This might be a benefit as well: what would the ctrl + d equivalent be on an Apple keyboard to break the loop? Perhaps ⌘ + d. This doesn't look like much of a hassle, does it?

I'll check with my mentor and Dr. Kratzmann about it, but I think that's a fine solution. I just implemented a new feature: checking if the whitespace-values file already exists. I'll trim it down, but just for purposes of understanding the functionality of it, here are two snippets that do the same thing:


    if (-e "$file.txt"){
        print "$file.txt exists.\n";
    }else{
        &csvwsv("$file.csv");
    }

    unless (-e "$file.txt"){
	&csvwsv("$file.csv");
    }

I think it's kind of amazing how efficient Perl can be sometimes (specifically the unless functionality!). I guess for now I'll work on cleaning up my code and creating subroutines for things where I can. The tricky part about doing that though is I don't quite understand how variables are shared between files or how to properly use my, our and local. I read some more gnuplot documentation and I decided to put gnuplot parameters to use. In grapher.pl, I load the following line to gnuplot: call 'graph-specs.gp' $file $xindex $index $axis $xaxis $font $width $height. This loads the gnuplot script graph-specs.gp and passes in the 8 following scalars as the gnuplot variables $0 through $8. Another bit of functionality I added was checking to see if the directory temp exists before the program creates it. I'd noticed that when I kill the program in the middle and then run it again, I get an error that says mkdir: cannot create directory `temp': File exists. Checking for the director first fixes that! Although it doesn't save any lines of code, I made a subroutine that rotates images. I'd say I've made a significant amount of changes since I last posted my code, so here's today's.

grapher.pl · grapher-helper.pl · graph-specs.gp

Top


June 27th, 2011

Well, today will be interesting; the internet seems to be down. We'll see how much can get done without the internet... Anyway, since Dr. Kratzmann has not yet returned, today's goal will be to fix the bug that the user needs to press Enter twice to make the prompts appear. I just checked my proposal and the time line I made at the start of the summer, and on both accounts I'm on track. My official proposal says weeks 4-5 are to be spent "writing a program that combines many graphs into a one-page .pdf," and my time line says weeks 5-8 are to be spent debugging, fine-tuning and user-testing. Here's the chunk of code that's requiring me to hit Enter twice:


    # Ask for the y-axes
    print "One at a time, enter a y-axis variable
	  from the list above exactly as it appears.\n";
    print "Once you have finished selecting the y-axis variables,
	  type Ctrl + d.\n";

    # Gathering the data from the user
    while(<STDIN>){
        print "Enter a y-axis variable: ";
        chomp;
        chomp(our $yaxis = <STDIN>);
        # get the column number from the whitespace values
        our $yindex = &stringindex($yaxis, \@header);
        # save all axes and indeces into an array
        $yvars{$yaxis} = $yindex++;
        push(@images, $yaxis);
    }

I guess for now I'll play with things like chomp and \n to see if that fixes anything. Here's a bug that had evaded me! Somehow I had never graphed something using the first variable in the .csv (SiO2, in this case), and it turns out all my column indeces were off by one! When I went to graph the first column, I got a series of error messages, the must cohesive of which was warning: Skipping data file with no valid points. Ah, that was easy! I was missing an $xindex++; that changed the x-axis variable index from its index according to Perl and its index according to gnuplot (columns in gnuplot start at 1, not 0). Note to self: I should add a bit of code that checks to see if meltdata.txt already exists before I make one from the CSV file. Anyway, back to the problem of too many line breaks.

I think it has something to do with while(<STDIN>). Pressing Enter lets the program know that there's still input to come. Since it takes Ctrl + d to exit the loop though, I don't see why it wouldn't just start the loop over until the user intentionally exits it. Well, the internet's back and I've been reading around some forums. One possibility I've found is that chomp eats up the newline at the end of any input, so I have to press Enter once to send the input and again to replace the newline character that chomp takes away. If I don't use chomp though, I'm never prompted for anything no matter how many times I press Enter, because my &stringindex subroutine is looking for $yaxis\n instead of $yaxis. Hmm... I removed chomp and added s/\n// to the subroutine to get rid of any newline characters, but that still didn't work. Putting a next statement at the end of the while loop doesn't work either. Neither do goto statements or fencepost loops. There has to be an easier way to do this...

Top


June 24th, 2011

Today is the day I hopefully end with a program that has all the basic functions of the final product. I tried wrapping the entire image-generation loop with the PDF code, and I got the following error: Can't call method "val" on an undefined value at /usr/share/perl5/PDF/API2/Resource/XObject/Image.pm line 105. Not quite sure what to make of that yet. Next I'll try it in a disjoint loop. Ah, it ended up being a very subtle error. I was using single quotes instead of double quotes when I was giving Perl the name of the image to be embedded. Double quotes allow for interpolation of variables, such as "$xaxis-$axis.png". That allows me to pass in the correct file names. The problem I'm having now though is that the PDF file is corrupted when I try to open it. Well, that's solved. I think it was something silly like I hadn't saved the file before trying it again. The axes are still looking a little squished though. I'm thinking I could rotate them? That seems to work for now. I don't want to do too much specific fiddling until I get some larger data sets from Dr. Kratzmann.

I've just finished implementing deleting temporary files! Using the backtick operators again, I used mkdir to make a temp folder into which all the images would be saved until after I embedded them in the PDF. Turns out rmdir won't delete directories that aren't empty though, so I ended up having to use rm -rf temp to delete the folder.

Now that I've got the skeleton of the program down, I'd like to spend time making it more efficient until I get some more complex data sets to work with. The headline of Programming Perl that I've been using says "There's More Than One Way To Do It." I'd like to think that means that my code will eventually get really concise as I continue to learn the language. Right now the entire program is only one file that's 174 lines long with a helper program that's 54 lines long. I'm not positive, but I'm pretty sure that's a lot less code than it'd take in Java. And actually, the helper file isn't necessary; I just thought it'd make the main file cleaner/easier to read; all that's inside are two methods that could just as easily be in the main file.

Anyway, I'd say I'm definitely on schedule now. I've got a program that does everything the final program should do. It's not perfect yet, of course, but I'm definitely getting close. I'd really like to see how this handles larger data sets; sometimes I get a little worried I'll have to rewrite large portions of my program. Anyway, here are the two main files and some sample PDFs titled by the x-axis of each graph.

grapher.pl · grapher-helper.pl · Al2O3.pdf · FeO.pdf · K2O.pdf (best example) · TiO2.pdf

Top


June 23rd, 2011

This morning I finished implementing the HoHoA data structure that contains all the information necessary for creating, rotating and embedding all the graphs. I've also added to the while loop that creates the graphs a little section of code that parses through the data structure to find the right variables it needs. I've since used those variables in the section of code that saves the graph as a PNG, but now I'm trying to use it for rotating the image by $deg degrees. I thought it would be easier how to call the terminal using a Perl script. Maybe it's one of those things that's so easy no documentation exists on it... ah well. That's what I'm working on right now.

So I've gotten as far as open(TERMINAL, "|gnome-terminal");, and I know it's kind of working because a new terminal window opens up when it gets to that part of the program, however, when I go back to the folder to check the images, all of them are still in their original orientation. Another downside of this I see is that not all computers have the gnome-terminal, so for now, it seems this will only work on Linux machines. So the bugs so far are this: the images aren't being rotated (even though that part of the code is being hit), it will only work on machines with a Gnome Terminal, and the terminal windows do not close once they've been opened (even though I close the file handle). Here's the chunk of code:


    if($deg==90){
	open(TERMINAL, "|gnome-terminal");
	print TERMINAL <<EOPLOT;
	convert -rotate $deg $xaxis-$axis.png $xaxis-$axis-rotated.png
    EOPLOT
	close(TERMINAL);
	print "Image rotated.";
    }

Success! I was right in that it's a lot easier than I was making it out to be, hah. All I needed to do was use the backtick operator and I was able to do it in one line: my $cmd = `convert -rotate $deg $xaxis-$axis.png $xaxis-$axis.png`;. That command rotates an image using the terminal and then overwrites it. This might have solved all my problems, actually; it goes straight to the system instead of the specific Gnome Terminal, the images are actually rotated, and no terminal windows are left open afterward!

The next tricky part is embedding all the images in a PDF. As I have it now, my program handles all the graphing/rotating one at a time. I can't embed the images one at a time though. If I did, it would just save over the PDF however many times there are pictures to embed and I'll be left with a PDF with the last picture embedded on it. So I have a few options:

  • Open the PDF for rewriting and keep adding images to it
  • Wrap the entire image creation/rotation block with code for the PDF maker
  • Make a separate block that embeds the images
    • The only problem with this is that it would be in a different scope than %graphdata and @images

I'm pretty happy with what I learned today, so I guess I'll post it :) And check it out, a sideways image!

grapher.pl

Top


June 22nd, 2011

I've been thinking about it, and my data structure will need to be able to store the name of each image as well, so it can iterate through all the files created by the previous step of the program. My first thought was a hash of hashes of arrays: the keys of the outer-most hash would be the total number of images to be embedded. The values of each of these keys would be an array as long as how ever many images are to be embedded. Each element in the array would serve as a key, and its values would be the array I mentioned yesterday that stores the height, width and x-y coordinates. The only problem with that approach though is that I couldn't put the names of the PNGs in ahead of time like I can the height, width and coordinates. I thought about using an array filled with the correct number of " " elements, but there's no clean way to rename parts of a hash, especially when they're layered that deep.

So I finally got a hold of pen and paper and came up with the drawing above. Here's how I figure it'll work: as the program runs, it will fill an array with the names of all the images it creates. Meanwhile, there will be a hash table that will accomodate embedding up to 9 PNGs on a single-page PDF. When it comes time to embed the images, the program will go to the key equals the length of the array of images names. From there, each key will map to a hash that's as many keys long as the name of its key. Heh, still following? So like you can see in the drawing, the array at the top is 5 elements long, so it accesses key 5 in the hash. Key 5 will map to a hash whose keys are the indeces of a 5-element data structure, i.e., 0-4. Then for each image in the original array, it will map to the similarly indexed key in the second hash, and voila! (I hope.) Time to try and implement it...

Well, I've hit yet another little hiccup. Gnuplot's proportions aren't the most desirable, per se, so when I got to embedding 4 landscape images to a portrait page, there was a lot of wasted space. I didn't find anything in gnuplot about rotating an image when saving it (although one can rotate the axes, labels, text, etc), but PDF:API2 has a simple $page->rotate(90); command. The only problem with that is it seems to rotate the PDF after I've embedded the images, even though it's at the start of the code. So what I get isn't a landscape PDF with nicely aligned horizontal PNGs embedded on it; I get the same awkward PDF I started with, just sideways. Grr... I've gotten to the point where I've read so much online documentation that all the pages left are just copied and pasted from other documentation sites... I'm even finding the same API2 rotation queries posted in different forums, hah. There's also a way to rotate content, $gfx->rotate(90), but no matter what coordinates I embed the image at, nothing shows up on the PDF. Anyway, to show you what I'm eventually getting at once I figure out how to rotate the images (or the PDF):


    # A hash to store all the data for gnuplot
    my %graphdata = (
    # img => (height, width, x, y, font size);
        1 => {
            0   => (600, 450, 6, 156, 32),
        },
        2 => {
            0   => (520, 390, 46, 398, 30),
            1   => (520, 390, 46, 4, 30),
        },
    );

And I'll continue that up through having 9 images to a page; I just need to figure out the layout first. Aha! Well, I've found one way around it that's quite easy! The Linux terminal has a command convert -rotate 90 figure_in.png figure_out.png. Here's what I have to do tomorrow:

  • Finish implementing the HoHoA data structure
    • Include "degrees rotated"
  • Implement the rotating through the command line
  • Get the program running to the point where it will embed multiple images on a page using the information in the data structure

The first step will probably be the hardest, because of all the math involved in figuring out where all the images will be embedded and what sizes they will be. That's a total of 45 possible locations! Well, I'm ending today having calculated the locations for all the images up through embedding 4 on a page. Check it out! Only 35 more to go...


    # A hash to store all the data for gnuplot
    my %graphdata = (
    # img => (height, width, x, y, font size);
        1 => {
            0   => (600, 450, 6, 156, 32),
        },
        2 => {
            0   => (520, 390, 46, 398, 30),
            1   => (520, 390, 46, 4, 30),
        },
        3 => {
            0   => (396, 297, 6, 396, 24),
            1   => (396, 297, 309, 396, 24),
            2   => (396, 297, 6, 0, 24),
        },
        4 => {
            0   => (396, 297, 6, 396, 24),
            1   => (396, 297, 309, 396, 24),
            2   => (396, 297, 6, 0, 24),
            3   => (396, 297, 309, 0, 24),
        },
    );

Top


June 21st, 2011

It's taking a lot of trial and error, but I'm slowly figuring out the dimensions of a PDF according to the API2 package. So far, all I've been able to conclude is that the placement of images goes by the bottom left-hand corner of an image, and (0,0) on the PDF is the bottom left-hand corner as well. In addition, there is no margin; images can be placed right up to the edges of a PDF. With a y-value of 800, the image isn't visible, so it's at most 800 "units" (pixels?) tall. Alright, well, I've figured it out, but I can't make sense of it. When I place an image at (0, 792) it disappears off the top of the page. The bottom row of pixels in the image is still visible when placed at (0, 791). Weird. What does that number have to do with anything? Now time to find the x-width, I guess. And the width is 612. Huh. A quick google search reveals that one point is ~1.33 pixels. Well, that's my PDF size then: 612 x 792. Units? Not sure yet, but I think points. Edit: turns out it is pixels. I placed two images that were 300x300 px, 300 units apart, and their edges lined up perfectly.

Now I have to alter grapher.pl such that the graph size will change depending on how many I need to fit on one page. I kind of wish I had some scrap paper and a pencil, hah. This is quickly becoming very complicated, but I think what I'm going to have to do it create a very deep hash whose keys are the total number of images to be graphed and whose values contain the new width and height of the image (to accomodate fitting multiple images to a page), the font size for the titles and axes of the graphs, maybe the location of what each image will be... At least I'll get better at navigating hashes!

For the past hour or so I've been trying out different settings in gnuplot for different sized plots. Once I figure out a setting, I integrate it into the hash of gnuplot data I've been working on. The layout would look much nicer, I think, if I didn't need to label the tick marks on the axes; I haven't found a way to resize those yet, so when I make a smaller graph, the tick labels stay the same and end up overlapping each other like this:

On a different note, I don't think I'll be posting any code today because I've mostly been doing a lot of math to help me figure out how the graphs should be generated. I'll post the start of my hash though.

    # A hash to store all the data for gnuplot
    my %graphdata = (
    #   num-imgs => (width, height, font size)
        1 => (640, 480, 32),
        2 => (500, 375, 30),
	3 => (300, 225, 24),
	4 => (300, 225, 24),
    );

The key for each entry in the hash is how many images I have to embed. What's going to be trickier though is figuring out where to place all the images on the pdf. I suppose I could extend the hash even further, i.e., each key would also map to the x-y coordinates of all the images. That means each key would map to a different amount of data. Then I'll have to decide if I want to just list the x and y values, or put them into the value as ordered pairs. Ordered pairs aren't a datatype in Perl as they are in Scheme though, although I could put them in as two-element arrays. In that case, I'd be working with a hash of arrays of arrays. Sounds intense!

Top


June 20th, 2011

Well, according to my proposal, this is the week I start writing the final program! I think I've got most of the pieces together so far. Th next tricky thing I have to encounter is accessing the images that csv-grapher.pl creates and embedding them on a pdf. That program automatically saves the images under the name of the variables that each graph compares. What I'm going to have to do then is save all of those [geochemical] variables as Perl variables that I can plug in as file handles when it comes time to round up all the images. I'm also going to have to figure out how many pixels high and wide a pdf is according to the API2 package, so I can calculate where to place the images based on how many there are. Even further, I'll have to initially create the images to be of a specific size depending on how many images there are.

Starting to actually write my final program is really exciting! What I'm trying to do right now is 1) Store many variables to serve as the y-axis of each graph and 2) Loop through all those variables to make graphs. I'm having a few little problems with that now. First, the user has to hit Enter twice for the y-axis variable to be added to the hash. I'm figuring that has something to do with the until(<STDIN> =~ /stop/) loop I'm using. Something about that syntax is also preventing me from getting out of the loop, so I haven't really been able to test the graphing part of today's program yet.

So, the I/O isn't very pretty yet (the user still has to hit Enter twice for the next prompt to appear), but I made some good progress. As is now, the program asks for a variable to be used as the x-axis of each graph, and then it asks for y-variables until the user enters Ctrl + d. It stores all of those y-variables and their positions in the .txt file in a hash. After that, it loops through all the values in the hash and generates PNG graphs with the original x-axis for all of them, each having a different y-axis. Check it out!

It takes a second or two to graph them all, but it does indeed graph them very quickly! Now that I've got my program generating multiple graphs with the same x-axis and user-input y-axes, I think my next big steps are the following:

  • Change size of PNGs depending on how many y-axes are used
  • Iterate through all PNGs somehow and embed them in a PDF
  • Use Perl to execute terminal commands
    • mkdir to save the PNGs in a specific place
    • rmdir to delete all files once the PDF has been made
  • Touch up the terminal interface

I'll try one a day for the rest of the week? At the start of Week 5, Dr. Kratzmann will look over what I've done so far and offer suggestions such that it can best fit his data. Afterall, the data I've been using so far is just a slice of the actual data sets this program is meant to work with. For fun, I added the four images that were generated by the program captured in the screenshot above.

grapher.pl · grapher-helper.pl · FeO-MnO.png · FeO-K2O.png · FeO-MgO.png · FeO-CaO.png

Top


June 16th, 2011

I can't believe I've gone an hour today without updating my blog since I've started this morning! I guess I've been really focused. I found a very subtle bug that was preventing me from plotting Dr. Kratzmann's data, and it took a while to realize it. So I started be generating a .txt file containing whitespace-separated values from Dr. Kratzmann's .csv file. If I plotted that, I'd get an error. If I copied and pasted that same exact text file into a new one, however, and plotted that, it would work! It wasn't until I was viewing the files in the Ubuntu File Browser that I noticed what was going wrong. The thumbnails were different! The thumbnail of the .txt generated by my Perl script only showed one line of text; the thumbnail of the file I made by copying and pasting was formatted correctly (each set of data starting on a new line).

See? Somehow, my Perl script is removing the commas (like it should), but writing the file all onto one line. It just took me a while to notice it because Komodo Edit (the text editor I'm using this summer) is too smart, hah. I found that if I typed less geodata.txt into the terminal though, I would in fact see one line of data. Interesting... upon viewing geodata.csv in the terminal, I see that it as well is displayed on one line, but with ^M where the line breaks should be... perhaps this is a job for pattern matching!

Aha! Turns out it's all in how the .csv file is created. I wonder if that will end up being a problem for whoever uses my program... how would someone who isn't viewing their files in a terminal window going to see that the line breaks aren't really there? That's a feature I'll have to add to my program. It will check to make sure the file is more than one line long, and if it's not, it will tell the user that the .csv is not formatted correctly.

So here's what I have so far: a program that asks for the name of a .csv and then turns it into a .txt with whitespace-separated values. Then it saves a graph of the first two data columns plotted against each other as a .png image! What I'm working on now is asking the user which variables they'd like to see plotted. To do that, however, I need to write a getIndex method that works for strings. Tricky indeed... finally got it to work. Pattern matching ended up solving the stringindex problem.

I think I'm done for today, but I'va made a lot of progress. My program only creates one graph at a time, but it lists all the variables and asks for an x-axis variable and a y-axis variable. Then it graphs those two axes against each other with the variable names as the axis labels and names the png file accordingly!

meltdata.csv · grapher-helper.pl · whitespace.pl · csv-grapher.pl · meltdata.txt · FeO-MgO.png · SiO2-K2O.png

Top


June 15th, 2011

So today I'm going to start combining everything I've been working on for the past three weeks. Specifically, I am going to write a Perl program that calls a Perl script that turns comma-separated values into whitespace-separated values, and then calls a gnuplot script that plots those whitespace-separated values. Another big step I'm taking today in writing such a program is that I'll be using the .csv file that Dr. Kratzmann supplied me with. Using that file as the source of the graph will allow me to fine-tune the gnuplot script to better suit the needs of my project.

For some reason, I'm having a lot of trouble figuring out how to call one Perl script from within another. I've been reading into the system() and exec() commands, but I can't find anything that specifies how to use those commands to do something as simple as calling a Perl script. I mean, it was almost effortless to call gnuplot using Perl, so why is there so little documentation on this? I'm also getting a Permission denied error from the terminal when I type out the entire path name of the Perl script I'm trying to call...

Success! One of the things that I love about computer science is that if you can't find the answer you're looking for in books or documentation, there's bound to be a programming forum somewhere in which someone eles has discussed the same thing. Turns out, it is in fact very simple to call a Perl script from another Perl script; all I needed to add at the top of the program was require some_program.pl; and voila!

Well, it seems I hit a hiccup today. I haven't taken into account that gnuplot can't graph files that contain text as opposed to numbers. My goal for tomorrow will be working out how to graph the data Dr. Kratzmann has provided me.

geodata.csv · whitespace.pl · csv-grapher.pl · geodata.txt

Top


June 14th, 2011

So remember how prices.gp was a really long file that had dozens of unused options? I think today I'm going to learn about what all those options are and how I can use them to benefit my project.

After playing around with gnuplot settings and styles for a bit, I decided to refresh what I've learned from the PDF:API2 Perl package. Creating a plain PDF and putting text on it seemed easy enough, so I thought I'd try for something more difficult: putting a PNG on a page. That was fairly manageable, but I was using PNGs generated by gnuplot to place on the PDF, and their initial resolutions are very large. Naturally, I dove into the internet looking for PDF::API2 documentation and found very little of use. It's such a loaded package that the documentation becomes overwhelming very quickly. I did end up finding a $content->scale($sx, $sy) method that should have been able to resize a PNG such that it'll fit on a page, but the terminal gave me the following error: Can't locate object method "scale" via package "PDF::API2::Resource::XObject::Image::PNG" at imgembed.pl line 21. I'm a little confused as to why the method appears in the documentation, but not when I call it. Anyway, I figured I'd find an easier way to resize the images. Using gnuplot! Gnuplot has a very concise method of setting image sizes when generating PNGs from text files. Now I know images.pdf might not look impressive, but it took at least an hour and 5+ documentation websites to get two images and a line of text to fit on a page. I'm proud of it, anyway :) Another step closer to the final product!

Note to self: I'll eventually have to figure out how to create the end product (a PDF filled with .PNGs) without actually saving any images, or I'll have to figure out how to delete files using Perl after I've created them. Otherwise this program could clutter up a computer very quickly.

rgb (data set) · rgb.gp · rgb.png · dots (data set) · dots.gp · dots.png · imgembed.pl · images.pdf

Top


June 13th, 2011

So far it looks like I'm right on track! This week is to be spent writing programs that involve both Perl and gnuplot. So far, it seems to be quite easy, actually. To keep things simple, my first goal was to write a Perl program that does the exact same things as stockprices. From what I can tell, Perl treats communication with other programs the same way it does with files. Rather than assigning a file name/location to a file handle, you use a pipe to direct it instead. For example, here's a simple Perl program that communicates with gnuplot to plot sin(x).


    open(GNUPLOT, "|gnuplot");
    print GNUPLOT <<EOPLOT;
    plot sin(x)
    set terminal png
    set output "sinx.png"
    replot                          
    set terminal x11          
    set output
    EOPLOT
    close(GNUPLOT);

The syntax for writing gnuplot scripts in Perl is very similar to how here-documents work. What I suspect is also possible to do, however, is write a gnuplot script, and then call that gnuplot script within a Perl program! It'd save a few more lines of code, anyway.


    open(GNUPLOT, "|gnuplot");
    open(SINX, "sinx.gp");
    while ($_ = <SINX>){
        print GNUPLOT $_;
    }
    close(SINX);
    close(GNUPLOT);

The code above does the exact same thing with four fewer lines of code. Four lines of code may not seem like much less, but consider this: no matter how long the gnuplot script is, I will always need only six lines of code to send every line of an entire *.gp file to gnuplot. I'd say that's pretty efficient. Anyway, before I get too far ahead of myself, I think I should lay out my daily plan for the week. Week 3:

  • Monday: Write simple Perl programs that write to gnuplot
  • Tuesday: Write Perl programs that call gnuplot scripts efficiently
  • Wednesday: Write Perl programs that call both gnuplot scripts and other Perl programs
  • Thursday: Modify the above programs such that they take input from the user
  • Friday: Write a Perl program that creates and names a graph based on user input

If I happen to finish any day's goal early, I'll continue learning about Perl's highlighted features, specifically pattern matching. Something cool I'm learning about scalars is that their values can be ambiguous. Last week I wrote a very simple pattern-matching program that determined if a URL began with "http://www." and ended with ".com". Today, I've decided to expand on that program. The thought occurred to me of course that not all URLs are formatted that way. I can easily broaden the criteria for a valid URL by using the or symbol, | , for alternation. I was quickly able to come up with the following pattern for a website URL using that technique.


    $protocol = "ftp://"|"http://"|"https://";
    $server = "www";
    $extension = "org"|"com"|"net"|"edu"|"gov";
    $website = /$protocol?$server?.\w+.$extension/;

Granted, it's still not all-inclusive, but it's a fair bit closer. Here's an explantation for all those symbols. The ? at the end of the first two scalars in the website pattern is the same as saying {0,1}, that is, that pattern can occur exactly 0 or 1 times. The \w* is a cool pattern as well. The \w represents all word characters; it's a shortcut for the pattern [a-zA-Z0-9_], and adding the plus sign means that a word character must be repeated at least one or more times in order to qualify as the domain of a website. I also made a pattern matching program that takes the filename of a *.txt containing a normally formatted address, then tells you if the address is formatted correctly. As usual, the day's code is below.

stockprices.txt · gnuplot.pl · stockgraph.png · sinx.gp · sinx.pl · sinx.png · websites.pl · myaddress.txt · mailaddress.pl

Top


June 10th, 2011

Right now I'm working with subroutines that take arrays and hashes as arguments. One of the tricky things about subroutines in Perl is that the definitions of the subroutines don't have any parameters. It's assumed that when the subroutine is called, the arguments will be in either $_, @_ or %_. I'm starting out with subroutines that will find the max and min values of an array, and I'll use those from there to write a very simple sorting subroutine (probably using selection sort). I've got three subroutines done in preparation for the selection sort (max, min, and printarray), but I'm stuck with the sort. The algorithm I was intending to use is this: choose the smallest element in the unsorted array, remove it, and add it to the end of the sorted array. The trouble here is that you can only remove elements in Perl based on index, and from what I can see, Perl has no "getIndex" (oh, the conveniences of Java...). Maybe I should try writing one. Maybe I'm not thinking in Perl yet and there's an easier way to do this.

Heh, it's turned out to be a good thing I tried to write my own getindex. To get the index of an element in an array, I need to pass in two arguments: the element and the array. I've learned I was wrong when I wrote the above paragraph. All arguments are contained in the @_ array. If I pass in two arguments, an element and an array, then the element is at $_[0] and the array is at $_[1], I think. I'm getting an error though when I try to run my getindex subroutine: "Modification of a read-only value attempted at subroutines.pl in line 12." Line 12 here is $index = $i if $_[0] = $array[$i];. This subroutine loops through the array, and if the element it hits equals the element I'm looking for, I set index to the loop iteration I'm currently on. Sounds like it should work, right? I would use a foreach statement, which has seemed to be giving me better luck when it comes to looping through arrays, but then I'd be looping through the elements regardless of their indeces. According to Chapters 6 and 8 in Programming Perl, I need to learn about passing references a bit. Perhaps I won't get to pattern matching today. Well, I'm getting output now, but it's definitely not what I was looking for.


    @randoms = (34,16,7,2,52,99,12,6,90);

    # get index
    sub getindex{
        my $index = 0;
        ++$index until $_[1][$index] == $_[0] or $index>$_[1];
        return $index;
    }
    $want = 52;
    print "Index of $want: ",getindex($want, @randoms), "\n";
					

The output? 35. That's certainly a peculiar bug. There aren't even 35 elements in the array, nor is there a number 35 stored in it.

EDIT: So after looking at the chapter on references, I realized that all I'd done wrong was forget a backslash in front of @randoms when I called the subroutine. Prefacing an array with a backslash passes in a reference to an array. Although I can't entirely explain my code's behavior, I'm pretty sure I can say it was reading the array as a scalar, adding one to the 0th index and then returning it. Glad that's solved.

subroutines.pl

Top


June 9th, 2011

As suggested by my mentor, I'm going to be working with arrays and pattern matching in Perl today. Pattern matching and regular expressions are often considered one of the great strengths of Perl, and arrays are a very common data structure. Between today and tomorrow, I plan to work on the following:

  • Array Sorting
  • Multidimensional Arrays & Hashes
  • Using Arrays and Hashes in Subroutines
  • Pattern Matching

Here's an interesting quirk. I started with this array of random numbers: @numbers = (73, 32, 53, 17, 2, 71, 31, 86, 38, 78); and used the same method to sort it as I did an array of days of the month. Apparently, the method sort(@array) only does things alphabetically. When I sorted the array of numbers, I got: 17, 2, 31, 32, 38, 53, 71, 73, 78, 86,. Understandable, I suppose; 1 does indeed come before 2... After doing a little research online, I found a segment of code that successfully sorted the array of numbers: (sort{$a <=> $b}(@numbers)). The peculiar thing here is that $a and $b are undefined. Maybe it's just due to the fact that I used the numerical compare operator <=> as opposed to the string compare operator, cmp.

It's tricky thinking of uses for multidimensional arrays and hashes to code on the fly. Just now I've made a multiplication table though, so I guess that's pretty cool. There's some very unintuitive syntax for printing an entire hash of arrays though. A good side to that, however, is I get to practice writing subroutines that make things easier. First up: a subroutine that prints every element in an array. I'll expand from there. Note to self: all of today's work comes from chapter 9.

array-sort.pl · multidimensional.pl

Top


June 8th, 2011

Today I'm going to make more of a reading day. It's occurred to me that part of my proposal was learning new languages. While I've been reading up to 50 pages each day in various text books and documentation, I'm mostly paying attention to what I think I'll need to know for my project. In essence, file I/O, PDF creation, text manipulation, etc.

It's fun noticing how Perl has bits and pieces of all the other languages I know. I learned just now that loops can have labels, much like in assembly language. Right now, I'm having a bit of trouble figuring out how the "redo" statement. Rather, I know how it works, but I can't think of any practical uses to make an example out of it. It's also very easy to get yourself into infinite loops using that. If you're not careful with changing the conditions, you could redo something forever!

test.pl · labels.pl · blocks.pl · scopes.pl · patterns.pl

Top


June 7th, 2011

So today I'm starting out by trying to write a program that turns comma-separated values into whitespace-separated values. I was able to write a fairly decent program off the top of my head, but all it did was copy the file and give it a different name. Not quite what I want. I think I need to understand file I/O a little more. I think the while-statement I was using to loop through the file reads it one line at a time. For my purposes, it'd be pretty useful if I could loop through the file one character at a time, discarding all the commas and printing everything else. I'm a little afraid I'll have to go deeper into regular expressions, which I've been fearing. I've heard they're hard. Here's my failed snippet of code:


    while ($_ = <CSV>) {
        if ($_ eq ",") {
	    print WSV "\t";
        }
        else {
	    print WSV $_;
        }
    }

Now that I look at it, I can tell I was thinking in Java just then. Doesn't quite work in this scenario, heh. I've gotten a little bit closer! I wrote a pattern that swaps a comma for a tab, but it only works for the first comma in each line of the file. Here's the code:


    while ($_ = <CSV>) {
	$word = $_;
	$word =~ s/,/\t/;
	print WSV $word;
    }

It still seems I need to work on handling each word in a file individually though, as opposed to in groups of lines...SUCCESS! All I needed to do was change the pattern to s/,/\t/g, which replaces the pattern globally, i.e., all occurrences of commas will be replaced with tabs! And I just tested it with gnuplot too! I'm so excited! I started out with a .csv, modified it in Perl to turn it into a .txt with whitespace-separated values, then graphed it in gnuplot! I still stand by that pattern matching is tricky though... I read almost the entire chapter on pattern matching and regular expressions before I'd learned what I needeed to know. When you think about it, it's actually really cool how powerful Perl is. That one line of code replaces all commas in a scalar with tabs. I can't imagine how much Java code it'd take to do that sucessfully, hah.

I just went back and looked at the week's timeline I made yesterday, and it looks like I'm a day ahead of things. I don't want to get too ahead of myself, but I feel like I might have to start thinking of directions in which my project could expand. Perhaps I could develop a GUI for the software instead of having it command-line based? Also, note to self: I'm through Chapter 5 in Perl today.

stockprices.csv · whitespace.pl · whitespacevalues.txt

Top


June 6th, 2011

Today, I start my work with gnuplot! So far, it's seemed to be both very powerful and yet easy to use at the same time. Without any files at all, I was able to create the following 3 graphs in less than a second each using the following 3 commands:

gnuplot> plot sin(x)


gnuplot> plot sin(x), x, x-(x**3)/6


gnuplot> plot [][-2:2]sin(x), x, x-(x**3)/6

Note that the second and third graphs are the same, but I was able to instantly scale it down just by inserting [][-2:2] before the functions to plot, which restricted the y-axis from -2 to 2.

In reading Gnuplot in Action though, I'm realizing there's a small additional step I'll have to make before I can start writing the software. Gnuplot takes whitespace-separated values, whereas the input files for the software are going to be comma-separated values. Shouldn't be too much of a hassle; I guess it's best I figured that out now then come time to run the program! Until then, I'll be learning/practicing gnuplot commands.

Just now I've written a script (*.gp file extension) that reads a list of stock prices over a period of 25 years, graphs them on separate lines, then saves the graph as a jpeg! What I'm thinking will be most efficient now in writing my software is exporting all the graphs as jpegs, and then laying out all the jpegs into a pdf as the final product. I was a little afraid at first that I'd have to convert all the graph output to PDFs, and then stitch those together, which I can imagine would be difficult.

Now that I'm starting to get a better idea of how the project will fall together, I think it's time I set some daily goals for this week.

  • Monday (today): Practice writing small bits of gnuplot code
  • Tuesday: Figure out what gnuplot options I may need to master to write the software
  • Wednesday: Write a Perl program that turns comma-separated values into whitespace-separated values
  • Thursday: Write a program that will graph a .csv file using Wednesday's program
  • Friday: Explore the idea of generating multiple graphs from such a program

I feel like all that may be a little too ambitious, but we'll see. I feel like I've been making good progess so far; it might be quite doable. Something I'm realizing now that I didn't include in either my timelines nor my proposal is how I'm going to write an interface for the command line. I wonder if I'll need any books for that, or if I can just do it through documentation.

Before I close today, I think my practice programs/files warrant a little bit of a description. For reference, I always list each day's files at the bottom of the page in the order I wrote them: oldest first, newest last. I figure for anyone who's following along, that'll make it easier to visualize how I'm teaching myself these languages. Anyway, stockprices.txt is the file I used to represent two stock companies' values over the course of 25 years. The next file, prices.gp, may seem rather long for a "practice" program, but that's because it was generated by the gnuplot command save "prices.gp", which takes a snapshot of the current terminal settings and saves them to the specified filename in whatever directory from which the program is being run. If I were to run prices.gp using the command gnuplot> load "prices.gp", I would get the image stockprices.jpg. I can get the same image by running the much shorter script stockprices.gp. Lastly, export.gp is a simple program that takes an output name as an argument, then saves the most recently plotted graph as a PNG with that name.

stockprices.txt · prices.gp · stockgraph.jpg · stockprices.gp · export.gp

Top


June 3rd, 2011

Before reading anything, I decided to test myself by writing a program off the top of my head, without looking at my notes, programs, books, etc., to see how much is actually sticking. Albeit simple, I wrote an average program that averages an array of numbers. The next step, ideally, would be to write a program that takes in numbers from the console until the user enters a blank line, and then average those numbers. But for now, I'm back to reading.

Here's a nifty example I've come across. I'm not sure exactly how it works yet, but it's used for determining if it's necessary to print the plural form of a word.


    printf "I have %d camel%s.\n",
		   $n,     $n == 1 ? "" : "s";

If you have exactly one camel, it will only print camel; if you have any other number of camels, it will print camels. The book used this as an example of a conditional statements though (which I understand), so I'm not sure how the program knew to populate %d with $n and %s with whatever result of the conditional statement. I know names beginning with % are hashes, but maybe %d and %s are reserved for Perl, perhaps "d" for digit and "s" for statement?

After having gone over my proposal again today, I realized that I'm not progressing as quickly as I'd originally planned. Rather than just read an all-encompassing book from start to finish, I'm going to take a step towards writing programs that are more specifically in the direction of my fellowship project. Just now I've finished my first program that successfully deals with file I/O. The user enters the name of a file, an error is printed to the console if the file is not found, and then the program prints each line and line number that the file contains. Not very advanced at all, but a step in the right direction, I think. Next big project ought to be converting simple text files that I write into PDFs.

I have succeeded in making a PDF! It was rather tedious, and the documentation took a while to read/make sense of, but I was finally able to write a program that produces a file "hello.pdf" with the words "Hello, world!" printed on it.

simple-average · conditionals · subroutines · average · files · pdf · My first PDF!

Top


June 2nd, 2011

So I'm starting with if statements today and noticed something odd but cool. When reading the text, I sort of skimmed over the lits of equality operators, because they all looked familiar: ==, !=, <=, >=, etc. So then I went to type a simple program in the command line.

perl -e 'print "Thursday" == "Saturday";'

You know what it returned? 1. So I looked at it, and realized that the strings "Thursday" and "Saturday" had the same number of letters, so that made sense. I went back to that list of equality/comparison operators then and realized they have an entirely different set of operators for strings: eq, ne, lt, gt, le, ge, cmp. The comparison operator is a new one to me. It returns 0 if both sides are equal, 1 if the lhs is greater, and -1 if the rhs is greater.

My next topic to tackle today was statements. It was easy to pick up on for loops (exact same format as in Java, except {} must surround the block no matter how few lines it is), if, elsif, else, while. The foreach statements were easier than those in Java, I found, but what really impressed me were the unless and until statements, which are counterparts to if and while, respectively.

I think I'm about to leave the comfort of learning a language that's a combination of two I already know really well. Ten more pages into Programming Perl and I've found myself dealing with regular expressions, pattern matching, file I/O, quantifiers, and more. Here's something cool about regular expressions though: the following two while statements do the same thing.
perl -e 'while ($line = <FILE>) {if ($line =~ /http:/) { print $line; } }'
perl -e 'while (<FILE>) {print if /http:/}'
Much more compact! Perl even includes shortcuts for common expression such as whitespace (\s), letters & numbers (\w) and digits (\d). It's looking to be a very convenient, powerful language, so far.

Otherwise, today's been more of a day for building blocks. I finished the first chapter of the book I've been using, and the next chapter is starting at the very basics of Perl and working its way up. Fortunately, I already know all about things like tokens, data types, and data structures thanks to my CS256 and CS364 classes I took last year with Dr. Richard Sharp and Dr. Lisa Torrey :) After having spent so much time with Java, it's startling how Perl only has three data types-- $scalars, @arrays, and %hashes. In that respect, Perl reminds me of Scheme, in which every data type was just a recombination of an ordered pair.

So I know it doesn't look like I wrote a lot of code today, because that's true. I did a lot of reading today, learning about the different contexts within Perl that scalars, arrays, and lists can be used. It's hard to write practice code this early on in learning a language... I know how to do simple things, but I'm not quite yet at understanding what the language is capable of. Hopefully things will pick up soon. I'm ending today on page 76.


statements · here-documents · v-strings · context

Top


June 1st, 2011

So today marks my first day of work on the fellowship! For the first chunk of time I'll be spending teaching myself Perl, I'll be using Programming Perl, 3rd Edition by Larry Wall, Tom Christiansen, and Jon Orwant. Today I plan to work on variable assignments and manipulation, some simple data structures, and maybe even writing a program that does more than print.

The coolest thing I've seen so far is an example in the book about assigning to scalars from arrays. It's like a really quick way to give each element in an array its own variable:

@home = ("couch", "chair", "table", "stove");
($potato, $lift, $tennis, $pipe) = @home;

Pretty cool, yeah? That sets the value of the scalar $potato to "couch"; $lift to chair, etc. One of things that's been most mind-boggling for me so far is the use of logical operators. The difference between Perl and all the other languages I've worked with (Java, Python, Scheme) is that Perl doesn't have explicit boolean values. Instead, Perl considers 0, '0', undef, (), '', ('') as false. Everything else is true! I decided to post each day's practice programs in my blog for now until the graphing software really gets started. So if you take a look at operators.perl, you'll see how different the logical operators are.

For example, in Math, 0 and 1 is false. In Java, false && true == false. In Perl, however, 0 and 1 == 0, similar to how '' and 1 == ''. Interesting. At the same time, it's really clever. The way I learned short circuiting, if the first item was false, return false. Perl doesn't actually evaluate both sides of the 'and' to make sure they're both false; it short circuits - if the first item is false, it returns the first item; otherwise, it returns the second item. By the same logic, "dog" and "cat" should return "cat", since "dog" doesn't match any of the false values I mentioned above. Yet I entered the following into the command line:

perl -e 'print "dog" and "cat";'

and the result was dog. Well, I guess understanding Perl logic is going to be my project for tomorrow. I will point out though that I wrote a program! Listed below, temperature.perl takes a number in from the console, converts it from Fahrenheit to Celsius, and prints it back out! And just a reminder for myself, I read the first 27 pages of Programming Perl today. Lastly, my workspace:


hello · variables · operators · temperature

Top


May 25th, 2011

I'll be arriving at St. Lawrence University in one week (May 31st) to start my fellowship! I'm very excited to get started. Here's a very simplified outline of how I anticipate spending my time for the next 8 weeks:

  • Week 1: Familiarize myself with the language Perl
  • Week 2: Familiarize myself with the language gnuplot
  • Week 3: Start using the two languages together to write simple file I/O programs
  • Week 4: Write the program
  • Weeks 5-8: Debugging and fine-tuning

Of course, there are plenty of smaller things I plan to be doing in addition to that as well. First, I'll be doing the best I can to update this "blog" every day with my fellowship progress. At the same time, I'll slowly be trying to convert it from inline HTML (what it is now, hah, hard-coded in Notepad!) to CSS. That'll add HTML and CSS to the languages I'll be learning this summer. In my free time, I'll be working 4 hours/week at the IT HelpDesk, where I've worked for the past two years; and 4 hours/week at the Quantitative Resource Center (QRC), where I've just been hired.

Regarding what I'll write about however... well, I have a few ideas. I've seen a little bit of Perl in my CS 364 class, and it's definitely an interesting language, so I'll be sure to remark on the quirks of the language and maybe some sample code I find particularly interesting. I'll probably be switching between computers a lot since I have a laptop that runs Linux in addition to the Bewkes lab computers. Right now I'm just killing time and filling up space, really... See you in a week!

Shelley

Top


Bibliography

D. Wegscheid. (2011, Jul 18). Time-HiRes-1.9724 [Online]. Available: http://search.cpan.org/~zefram/Time-HiRes-1.9724/HiRes.pm

L. Wall, T. Christiansen and J. Orwant, Programming Perl 3rd ed. Sebastopol, CA: O'Reilly Media, 2000.

M. Hosken. (2011, Jun 3). Package: libpdf-api2-perl (0.73-1) [Online]. Available: http://packages.ubuntu.com/lucid/perl/libpdf-api2-perl

Philipp Janert, Gnuplot in Action. Greenwick, CT: Manning Publications, 2010.

R. Kobes. (2011, March 10). PDF-API2 documentation [Online]. Available: http://kobesearch.cpan.org/htdocs/PDF-API2/

T. Williams et al. (2010, March 5). Gnuplot 4.4: An Interactive Plotting Program [Online]. Available: http://www.gnuplot.info/docs_4.4/gnuplot.pdf

Top