Handling Data
1: Introduction 2: Simple example 3: Fancy example 4: Running Gri 5: Programming Gri 6: General Issues 7: X-Y Plots 8: Contour Plots 9: Image Plots 10: Examples 11: Handling Data 12: Gri Commands 13: Gri Extras 14: Evolution of Gri 15: Installing Gri 16: Gri Bugs 17: System Tools 18: Acknowledgments 19: License 20: Newsgroup 21: Concept Index |
11: Handling DataGri can handle many different sorts of data file formats, including ascii files, binary files in machine format, and the very powerful and increasingly popular netCDF format. (For more information on netCDF format, see `http://www.unidata.ucar.edu/packages/netcdf/index.html ')
This chapter concentrates on ascii format. The overall message is that
you should not have to modify your data files to work with Gri.
For example, many oceanographic data files have header lines at the
start. With other plotting systems, users find themselves stripping off
these headers as a first step in data analysis. This is done to make
the data look like a tabular list, or matrix, for reading by matlab or
various spreadsheet-like programs. (It is not necessary to do this in
matlab, by the way; you should use the matlab `fgets ' command
instead, to read and skip the header lines. However, it is almost
always necessary to do this in spreadsheet-like programs, especially the
GUI-based ones, because the paradigm is often to click on columns of the
data that represent variables of interest.)
The difficulty with stripping off header lines is that unless you are
careful, you can lose the header information unless you are careful to
put it in a separate file with an appropriate filename, and then just as
careful to archive the header along with the data, and to send both to
your colleague who has requested the data, etc. Often the header
information seems unimportant to you at the moment, but it may be
crucial to you later on, or to the next person who looks at the data!
In Gri it is very easy to handle headers within files. It's also easy
to handle data that are in somewhat odd formats, or that must be
manipulated mathematically or textually to make sense.
11.1: Handling headers11.1.1: Case 1 -- known number of header linesThis is easy. If you know that the file has, say, 10 header lines, you can just do this:
11.1.2: Case 2 -- header itself indicates number of header linesQuite often the first line of a file will indicate the number of header lines. For example, suppose the first line contains a single number, indicating the number of header lines to follow:
11.1.3: Case 3 -- header lines marked by a textual keySometimes header lines are indicated by a textual key, for example, the characters `HEADER ' at the start of the line in the file. The easy
way to skip such a header is to use a system command. Depending on your
familiarity with the operating system (here presumed to be Unix), you
might choose to use Grep, Awk, or Perl. Here are examples:
| ' mechanism, see Open. The Grep command
prints lines which do not match the indicated string
(because of the `-v ' switch), and the `^ ' character stands for
the start of the line see Grep. Thus all lines with the key word at the
start of the line are skiped.
11.1.4: Case 4 -- reading and using information in headerConsider a dataset in which the first line gives the time of observation, followed by a list of observations. This might be, for example, an indication of the data taken from a weather balloon released at a particular time from a fixed location, with the main data being air temperature as a function of elevation of the balloon. The time indication might be, for instance, the hour number. One might need to know the time to print a label on the diagram. You could do that by:
sprintf ' command has been used to change the numerical
time indication into a synonym that can be inserted into a quoted string
for drawing the title of the diagram see Sprintf. Here the time has
been assumed to be a decimal hour. You might also have three numbers on
the line, perhaps a day, an hour and a minute. Then you could do
something like
%.0f ' code is used to ensure no numbers will be written
after the decimal point. Naturally, you could convert this to a decimal
day, by e.g.
11.2: Ignoring columns that are not of interestQuite often a dataset will have many columns, of which only a couple are of interest to you. Consider for example an oceanographic data which has columns storing, in order, these variables: (1) depth in water column, (2) "in situ" temperature, (3) "potential" temperature, (4) salinity, (5) conductivity, (6) density, (7) sigma-theta, (8) sound speed, and (9) oxygen concentration. But you might only be interested in plotting a graph of salinity on the x-axis and depth on the y-axis. Here are several ways to do this:
* ' is a place-keeper to indicate to skip that column.
For a large number of columns, or as an aesthetic choice, you might
prefer to write this as
11.3: Manipulating columnsSuppose the file contains (x,y), but you wish to plot 2y times x. You could do the doubling of y within Gri, as
11.4: Combining columns from different filesSuppose you want to plot a column (`y ', say) from one file versus a
second column (`x ') from a second data file. The easy way is to
use a system command to create a new file, for example the Unix command
`paste ' -- but of course you don't want to clutter your filesystem
with such files, so you should do this withing Gri:
|