THE COMPUTER
TRANSITION SYSTEMS
REPORT - NOVEMBER
2001
COMPUTER TRANSITION SYSTEMS, BOX 4553, MELBOURNE, VICTORIA, 3001
http://www.cts.com.au
--- phone (03) 9530 6633 --- fax (03) 9530 6644 --- email: info@cts.com.au
VISUAL FORTRAN NEWS
BEOWULF - Low Cost High Performance
Lahey/Fujitsu Fortran for .NET
SciSnet Library - fast transfer between programs
F77L & F77LEM/32 USERS
TECPLOT Version 9
LGO - Global Optimization Software
CPU Update
WINTERACTER VERSION 4.0
FORTRANPLUS - satisfactory quality, bargain price
LF95 for Linux 6.0 - improved compatiblity
Sciplot 7.0
CATALOGUE SECTION
All prices are based on A$=US$0.50 and will vary with the exchange rate
In August Intel took over ownership of Visual Fortran. The latest release, version 6.6, is still badged 'Compaq'. Subsequent releases will all be labelled 'Intel Visual Fortran'. Users of version 6.5 may download the version 6.6 update from
www.compaq.com/fortran . Anyone who has difficulty with the download may borrow the software on CD from Computer Transition Systems. The principal improvements in version 6.6 are optimisations for P4, real*8, and logical*8. Current prices are: Pro Edition is $1661 - updates $624.50; Standard Edition is $1243 - updates $415.80. Academic prices are Pro $1083.50 - updates $415.80. Standard $808.50 - updates $311.30. We anticipate that all upgrade prices will increase by 50% after 31 December. A single user network licence (called concurrent 1) is now available. We can provide 30 day evaluation copies of Visual Fortran standard edition on request. We strongly recommend that Visual Fortran owners register their compiler with Compaq (http://www.compaq.com/fortran/register/index.html). The informative Visual Fortran Newsletter is automatically emailed to registered users of Visual Fortran. The most recent newsletter issue is X. It appeared in September. Implementations of both the NCAR and GKS libraries for VF are available free to non commercial users at www.fpp.uni-lj.si/~milan/ncarg and www.fpp.uni-lj.si/~milan/gks.
Beowulf - Low Cost, High Performance
A modern supercomputer doesn't really execute operations much faster than a high-end PC. It just executes more operations simultaneously. Many of the supercomputers listed at
www.top500.org have clock speeds much less than 500 Mhz. Much of the difference in performance (and cost) between PCs and supercomputers comes from instruction sets, the degree of pipelining, the memory bandwidth and arrangement, memory speed, construction, etc,.At the forefront of cost effective high performance computing is Beowulf clusters. Indeed it has been proposed that PCs of the future may evolve into Beowulf systems. In this vision the base computer has slots where additional computing nodes (cpu plus memory) can be attached to provide increased performance. The term Beowulf has become a common name for a cluster of relatively low-cost computers connected together to perform
parallel tasks. The name Beowulf was first given to to a machine bult in 1994 at the US NASA Goddard Space Flight. Several Australian universities and other research organizations are using Beowulf clusters to provide their people with high performance parallel computing at modest cost - presently hardware costs of less than $2000 per gigaflop are practical. Some companies with high performance requirements (eg geophysical service and production companies) are now using Beowulf clusters. The Bunyip Beowulf machine at ANU was built in April 2000 using 91dual processor P3 based PCs and has achieved a sustained performance of more than 160 gigflops. A Beowulf cluster usually consists of a group of PCs running Linux linked to a server node PC (and optionally to each other) by 100 megabit (mb) ethernet connections. Al;though faster ethernet hardware is available 100 mb cards are very inexpensive (approximately $50) and so are the network hardware normally used in Beowulf clusters. Gigabit ethernet cards are now just over $100 but gigabit switches are still quite expensive. There are two types of devices for interconnecting individual nodes to make a cluster: hubs and switches. For clusters of 16 or more nodes switches are better. One of the main differences between a Beowulf and a Cluster of Workstations (COW) is the fact that Beowulf behaves more like a single machine rather than many workstations. In a Beowulf cluster only the central controlling PC is connected to the outside world and it is the only PC which needs to have a keyboard, mouse, display, and hard disk. All of the other computers comprising a Beowulf machine can be viewed as just a cpu plus memory and minimally consist of just a motherboard with an ethernet card although they may have a small hard disk and/or a floppy disk installed to make maintenance and computer start up easier. An ordinary PC case is usually the most economical (and flexible) way to mount and power the motherboard. Beowulf machines built by NASA and others have frequently been constructed using a wide variety of obsolete PCs which the organization no long uses.Most Beowulf type machines use Linux. Some people refuse to call machines Beowulf unless they use Linux. However people can and do use FreeBSD, Solaris and Windows2000 (although W2000 consumes a great deal of memory). The advantages of Linux are it has excellent networking performance and there is comprehensive Beowulf software available. Having source code is also a decided advantage. When a program fails to work on a Beowulf machine there are normally three possible causes . application code, the network driver, or the o/s kernel. With proprietary operating systems it can be difficult to rectify problems which are due to the driver or the o/s kernel and the provider of the o/s will invariably blame the application code for software crashes. While this explanation is convenient for the o/s provider it is not very helpful in situations where the cause lies elsewhere . which is not infrequent since in a parallel hardware environment there can be subtle demands placed on the network driver the o/s kernel. The resolution of ensuing conflicts can often best be resolved by fixes in the driver or o/s kernel. Archived Linux mailing lists and newsgroups have made it vastly easier to identify and fix such problems. This focused detailed and responsive support is in general not available to individual users of proprietary operating systems. There is also the matter of the price of proprietary operating systems. A five client W2000 licence costs about the same as the total cost . hardware and software . of an 8 node Beowulf! Many Linux distributions include extensive Beowulf software and in the future it is likely that anyone setting up a straightforward Beowulf will find all the software needed is included in the Linux distribution. Setting up and configuring a cluster requires Linux administration skills. Basically, knowledge of networking, addressing, booting, building kernels, Ethernet drivers, NSF, package installation, and compiling will be required for successful administration. The LDP (Linux Documentation Project
http://metalab.unc.edu/LDP/ ) is excellent documentation on this.Can one take current programs and have them execute faster on a Beowulf? The answer is 'maybe' - if you put some work into it. As a general rule a program that can run faster on a Beowulf machine will not run any faster on one unless it is specifically designed and written to take advantage of a parallel environment. When parallel processes need to exchange data a message passing mechanism is required - the user code sends data from one CPU to the user code running on another CPU. Probably the most widely supported and portable message-passing method is the MPI message-passing standard. There are many implementations of the MPI standard. MPICH is freely available and is one of the most popular implementations. Some sites have installed the LAM/MPI implementation which is also freely available. Under Windows one could also use the SeiSnet software described elsewhere in this newsletter. An online text 'Designing and Building Parallel Programs,' by Ian Foster is available at
http://www.qpsf.edu.au/mirrors/dbpp/text/book.html. Another is 'Parallel Programming . Basic Theory for the unwary' - http://users.actcom.co.il/~choo/lupg/tutorials/parallel-programming-theory/parallel-programming-theory.htmlFor historical reasons, most number crunching codes are written in FORTRAN. Consequently FORTRAN has the largest amount of support (tools, libraries, etc.) for parallel computing. Many programmers now use C or re- write existing FORTRAN applications in C with the notion the C will allow faster execution. While this can be true since C is the closest thing to a universal machine code, it has some major drawbacks. In particular the use of pointers in C makes determining data dependencies all but impossible. If you have an existing FORTRAN program and think that you might want to parallelize it in the future - DO NOT CONVERT IT TO C. The Lahey LF95 for Linux and the Absoft Pro Fortran for Linux are excellent Fortran compilers for use on Beowulf clusters.
It is highly nontrivial to parallelize code. This is simply because your program isn't usually aware of data dependencies and time orderings. Serial code being parallelized may need complete rearrangement and not just a plug-in routine. A tool like BERT 77 ("an automatic and efficient FORTRAN parallelizer"
http://www.plogic.com/bert.html) can tell you where and how to parallelize. It can also suggest when a routine can be usefully replaced with a plug-in parallel version. Other public domain tools such as Parallel Programming Tools (http://sunmp.elis.rug.ac.be/ppt/index/research_topics/overview/index.html ) can be accessed on the web. In the end those who want to effectively parallelize code must really do a detailed study of their software - prepare a block structure and then use a profiler to determine where time is being used . It then becomes apparent what code can usefully done in parallel and what code must be left serial. Sometimes code that on the surface of things runs inefficiently can be rearranged to run efficiently. However, this rearrangement is not usually obvious or intuitive to somebody who writes serial von Neumann code and is usually nothing at all like the original serial code one wishes to parallelize. In determining the potential gain in paralleizing a program it is important to assess potential bottle necks. These include CPU processing, L1 cache, L2 Cache, main memory, and network. For most of these two aspects must be taken into consideration . latency (reaction time) and bandwidth (throughput). A very readable introductory discussion of bottle necks can be found in chapter 6 of 'Engineering a Beowulf Style Computer Cluster" by Robert Brown which is available at http://www.phy.duke.edu/brahma/beowulf_online_book/. A consideration of bottlenecks is preferably done before implementation of a Beowulf since the structure and composition of nodes may be have a critical effect on the execution speed of the types of programs the users want to run. Having 2 cpus . one devoted entirely to communication . and/or more than one network card per node may substantially increase the execution speed of programs limited by network speed. Programs which are memory bound should have nodes with a maximum of memory. Here the choice of motherboard can be important. Some will take more memory than others. Memory itself is quite inexpensive. Close examination of the program code may reveal that increasing the number of nodes beyond a certain level will provide no increased benefit on run time.
Lahey/Fujistu Fortran for .NET
.NET is a recent Microsoft project that 'focuses on distributed computing and software development productivity.' Many aspects have not been finalized so .NET is very much a 'work in progress'. One of the major features of .NET is complete interoperability with other .NET languages. The mechanics are that all compilers generate Microsoft Intermediate Lang-uage (MSIL), rather than native, machine code. A just-in-time (JIT) compiler provided by the .NET framework then compiles and links the MSIL for execution when an object is needed.
The major new initiative at Lahey is working with Fujitsu to produce a .NET Fortran compiler. Extensive information on their Fortran .NET project is available at www.lahey
.com/netwtpr1.htm. Those who are interested in Fortran.NET should look at the netwtpr1.htm page. A preview version of the Lahey/Fujitsu .NET compiler can be downloaded from this web page. Current source code should be compatible with it as long as it 'does not contain Fortran 90 obsolescent statements or restricted statements for this product, such as ENTRY, EQUIVALENCE, UNION, INQUIRE, NAMELIST, TRANSFER, or service routines.'
SciSnet Library - fast transfer between computers
SciSnetTM,
,a new library from MicroGlyph Systems, allows programs to carry out distributed real-time processing. SciSnetTM is a Fortran-based library of functions (a C version is also available) for high-speed packet communication on networked PCs running Windows operating systems. Multiple machines can communicate over a local area network (LAN) or a wide area network (WAN). The Internet also can be used, but transfer rates will be dependent on current traffic conditions. Any network media [USB, firewire, Ethernet (10/100), FDDI] that has been configured to use TCP/IP can be used.So what is a TCP/IP socket? The TCP/IP socket is an IPC (Interprocess Communication) device that exists at the top of the network protocol stack. It provides a connection-oriented socket that is used to deal with the actual TCP/IP protocol. The Win32 socket API insures that the designated recipient receives messages. SciSnetTM implements a socket server and client on each machine. The server handles all the inbound and outbound streams to and from other servers on the network. The local client transparently communicates through the local server with other clients on the network. Normally TCP/IP sockets communicate via streams, but SciSnetTM has packaged the byte streams as discrete messages with header and acknowledgments.
With current hardware technology, it is easy to configure a set of Intel Pentium4 PCs for a highly parallel distributed-processing task. Certainly, there is plenty of memory bandwidth for each processor (3.2GB/s) and even 133 MB/s available for the PCI bus. Very inexpensive interconnects can be installed using PCI Ethernet cards (100 mbs) with category 5 copper and Ethernet hubs. The new PCI based SCSI 160 adapters easily obtain 40 MB/s transfer rates over low-voltage differential-pair connected SCSI disks (73 GB). Using these readily available building blocks it is possible to design a distributed real-time processing system suitable for substantial data collection, processing, and recording. Applications that come to mind would include radar signal processing, imaging processing, and real-time event recording systems such as nuclear accelerators.
How can SciSnet help Fortran users? In such systems, the SciSnetTM library enables any pair of machines to communicate at very substantial transfer rates. On an Ethernet (10/100), sustained transfer rates of 10 MB/s have been measured which is up to 80% of the full 100 mb/s available. The library allows multiple parallel packet transfers between two or among many machines. Distributor or commutator type applications can easily be designed. With such configurations, it is possible to have a data generator or gatherer machine provide data streams to multiple processor machines, taking advantage of parallel data processing. There are simple examples in the SciSnetTM, manual. One example program, TNET.F90, allows a developer to bring up a socket server, open a socket to a remote machine, query status of the connection, send messages, receive messages, and flush pending messages. In TNET. F90 all SciSnetTM functions are used
There are two development environments available for applications: the Compiler IDE or a DOS window. In the IDE, Fortran users can create network projects utilizing a graphics windows interface (GUI), or a Fortran console window. In a DOS window, Fortran users can utilize a BAT file or the command line interface to develop applications. Under either environment, SciSnetTM can be integrated with other libraries since the SciSnetTM library is multi-threaded and compatible with Win32 API development.
SciSnet is $440. Versions are available for LF95, LF90, Absoft Pro Fortran, Compaq Visual Fortran, and Microsoft PowerStation as well as most C/C++ compilers - Absoft Pro, Borland, Visual, Symantic, and Watcom.
If you have been using F77LEM/32 or F77L but now wish to have ease of use with Windows 95, 98, NT, or 2000 then LF95 Express ($561) is the logical choice. LF95 consists of the LF95 compiler, a linker, and a debugger. An editor is not provided. The operation of the compiler and debugger are command line. LF95 Express is also is also available for Linux. It is an excellent compiler for use on Beowulf clusters.
Tecplot is the powerful and flexible data visualization software producted by Amtec Engineering. It should be of interest to anyone who needs to understand complex three dimensional data or needs to produce high quality technical diagrams for publication or presentation. Tecplot version 9 was released recently. Major enhancements in this release include massively faster rendering, improved slicing, iso-surface & streamtrace tools, new view controls, numerous algorithm improvements, user defined curve fits, configurable raster image formats, and vibrant true colour. Whatever your visualization needs, Tecplot will improve your plotting analysis and presentation. It gives you the versatility and power to produce fast, accurate plots. Evaluation copies can be downloaded from our website -www.cts.com.au/tecplot.html
. Considerable information on Tecplot can be found on this site. Amtec (www.amtec.com) host a useful discussion group for Tecplot users.
LGO Global Optimization Software
Global optimization software can be used to find the optimal solution of complex models that have a large number of minima or maxima which may be similar in magnitude (see illustration to the left. Computer Transition Systems is the Australasian representative of Pinter Consulting Services, developers of the LGO global optimisation package. LGO is based on award-winning research; it uses a suite of robust and efficient global and local scope search algorithms to find the best solution of complex nonlinear models. Model visualization facilities are also built into the MS Windows version of LGO, to assist the user in checking the model and the results obtained. LGO is available in several configuration sizes (from a maximum of 20 variables and 20 constraints to an unlimited number of variables and constraints). Educational and not-for-profit research licenses, and customized versions are available for personal computers and workstations. The price of the 20 variable/ 20 constraint version of LGO for Windows is $2695 (academic $1595). the price for unlimited variables/constraints is $8800 (academic $5500).
Examples of application areas for LGO include inverse model fitting (calibration), 'black box' (confidential) model tuning and optimisation, data classification, non-linear approximation including general surface fitting, solution of systems of non-linear equations, minimal energy problems in computational physics and chemistry, chemo- and radio-therapy optimisation, optimised tuning /operation of equipment and instruments, and robust product/mixture design in the chemical process (and other) industries.
For further information, please contact Computer Transition Systems. Professional information regarding LGO and the underlying research is summarised on the Web pages of the senior developer, Dr. J.D. Pinter; see http://is.dal.ca/~jdpinter/
.
Fewer and fewer cpus are being evaluated with the SPEC benchmarks. The most noticeable absences are the PowerPC chip and the processors used in supercomputers. In order of merit (high numbers are better) the best floating point results (www.specbench.org/osg/cpu2000/results/cfp200.html
) for various processors are 2GHz P4 (RD ram) 704, 800 Mhz HP Itanium 655, 833 Mhz Alpha21264B 643, 1.3 Ghz P4 557, 500MHz SGI R14000 436, 900Mhz UltraSparc III 427, 550MHz HP PA8600 400, 1.3 Ghz Athlon (ddr ram) 414, 1.3 Ghz Athlon (normal ram) 348, 750 Mhz RS6000 64 IV . 359. In these results which are for computationally intensive Fortran programs the 1.3 Ghz P4 is 1.3 times faster than a 1.3 Ghz Athlon!
Winteracter is a very flexible graphics and user interface library for Windows, Linux, and Sun. Winteracter version 4.0 will be released at the end of November. Improvements over version 3.1 are in five major areas . the resource editor, graphics text handling, import/export file handling, operating system interface, and user interface.
Resource Editor
- A new resource editor lies at the core of Winteracter v4.0. The new editor combines the functionality of the four previously separate dialog, menu, icon/cursor and toolbar editors. The new editor greatly streamlines GUI design and management of the associated resource files. Navigation of dialogs, menus, etc. is now much easier, with the entire contents of the program resource accessible via a single treeview. There is no need to constantly load the resource into multiple tools. The combined editor also provides greater consistency of behaviour when editing different resource types and features numerous minor enhancements.Graphics Text Handling
Winteracter's graphics text handling has undergone a major redesign in v4.0, providing a much more concise calling interface and additional functionality. While the old calling interfaces remain supported, most programs will benefit from the new interface. New features include access to any Windows font in GDI graphics output to screen, bitmap, printer or metafile. The need for external software font files has also been eliminated. Opaque text is now supported, as is direct output of numeric values.Graphics Import/Export
Winteracter's graphics output and import capabilities have been further expanded at v4.0. The CGM importer has been substantially upgraded to read a much wider range of third party metafiles. BMP, PCX and PNG bit image files loaded via IGrLoadImage are now reproduced in graphics output via the Print Manager, Windows metafile, CGM and raster image/hardcopy drivers. Version 4 also generates smaller CGM and DXF files.Operating System Interface
New operating system interfaces features include: Generation of temporary file/directory names, routines to get/set file attributes, drive serial number information and simplified access to file size/date/time information.User Interface
As always, version 4.0 features various general user interface enhancements, including: Mouse double click recognition, child windows inside other child windows, improved wheel-mouse support and immediate access to spinner field values as they change.Version 4.0 upgrades are $440 ($352 academic) from Winteracter 3.1, $638 ($517) from Winteracter 3.0, and $869 ($693) from earlier versions of Winteracter.
FORTRANPLUS - satisfactory quality, bargain price
The student version of the FortranPlus compiler produced by NA Software has been improved significantly in the current release. It is now allows up to 2000 lines per file (as opposed to the original 200). There is no limit to the number of files which can be linked together so the 2000 lines per file limit although a nuisance is not a major limitation. Prices for FortranPlus have all increased recently. The student edition for Linux or Windows is now $130 plus $13 GST. It may be purchased by anyone . not just students.
LF95 for Linux version 6.0 improved compatiblity
The most recent patch file for LF95 version 6.0 for Linux allows the compiler to work properly with recent Linux distributions. Those with earlier versions of the compiler may upgrade to 6.0 for $335.50
MicroGlyph Systems has released version 7 of their Sciplot graphics library. Sciplot is a simple to use, Calcomp compatible 2D graphics library available for most 32 bit Fortran and C compilers. An extensive range of export file types can be produced by Sciplot.
Return to Computer Transition System's home page.
Updated 12 November 2001