A large portion of my professional career has been spent developing scientific software for biologists, geneticists, biochemists, and statisticians.
Formally, my training is in numerical methods as applied to computer graphics. Many of my scientific collaborations have benefited from understating numerical techniques and being able to present data in a visual form. I highlight some of these below.
My recent collaborations have involved GIS work, browser based applications, and cluster computing all of which are applicable to developing the InVEST tool.
Recent projects where I have been lead developer include:
Below I highlight the following topics
In this section I highlight an early project that I assumed the lead role in, including the development and maintenance of a computational pipeline and toolkit to analyze 3D pathological differences between wildtype and retinoblastoma deficient mouse placenta. I selected this project because of several similarities between it and InVEST, namely:
This project helped me refine what designs work well in an academic programming environment. When starting a new project I often reflect on this project drawing lessons from what worked well and what didn't. I discuss some of this in my vision for InVEST below.
The biological focus was to study the regulatory effects of the retinoblastoma (Rb) gene in mice, specifically how the gene affects the development of the placental tissues as a complete organ. Previous studies had pertained to effects at the cellular and gross level, but none had been done quantifying physical changes at the level of the entire organ.
At the biological level, placenta were harvested and mounted in paraffin, divided into 500-1500 slides and scanned with a high resolution microscopy imager. This step was laborious and one of the initial challenges of the project. The result was a dataset of high resolution images, similar to the one on the right that made up the original placenta.
Challenges:
Results:
I was introduced to this project for the purpose of developing a 3D visualization tool. Over time, I assumed the lead developer role on the software side, organized the development pipeline and acted as the primary intermediary to the geneticists.
To gain an understanding of the underlying scientific questions, I read the literature produced by the human cancer genetics group. Later I met with the primary scientist and later assisted her in the lab. This experience defines much about how I develop software. Immersing myself in the problem domain not only helps me learn the domain jargon but allows me to understand what the users need in their software. Good software cannot be developed independently of its context.
One of the challenges of developing user requirements is that what the users think is hard is often easy, and what they think is easy is hard. The challenge is to identify what the user really wants, and find possible alternatives to the hard stuff.
As the project developed we identified two end products:
User interfaces were designed primarily by asking the scientists what they need to answer their questions. We would then build prototypes, and run informal usability tests during each bi-weekly meeting. As the project progressed scientists became comfortable downloading updates through the repository and often provided us feedback remotely.
Before I arrived, the parts of the pipeline that had been developed were written in bash scripts and Matlab. We later added C++ for the image processing, 3D visualization, and quantification. We chose C++ as it easily integrated with several libraries we intended to use:
In terms of platforms, we developed much of the image processing pipeline on Linux since the primary infrastructure available to us were Linux machines. The geneticists were familiar with Windows, so we developed the 3D visualization in Windows.
Releases were handled through a CVS repository. Working builds were checked out by co-developers. For each working implementation, we would roll a binary for scientists to download off the project site. Over time some of the scientists also used the repository system for updating the quantitative and 3D visualization tool. Once development was in full swing, releases were generated bi-weekly to coincide with project meetings.
The figure to the right shows our final pipeline. I'll highlight some key design decisions.
Communication between developers and users changes everything. An effort on the part of the developers to understand the scientific problem at hand, rather than focus on implementation, results in a higher quality product and builds trust across the team.
Investing effort into a one step build would have been a good idea. During several time crunches we would need to make changes to the system, then go through a complicated multi-part build step.
The lack of a bug reporting system was frustrating. Several pieces of the system developed some "bug lore" that commonly caused problems. In retrospect, bugs should have been fixed before new features were developed.
A measure of success is if the software is used. After the placenta project was complete this tool was used to image mouse mammary ducts with the same workflow. Results of that project are shown to the right. Recently it was used to image mouse neurons.
I recommend that the project focus on:
In the short term I would spend time getting a handle on the code base, identifying known issues including defects and runtime inefficiencies.
Next I'll discuss following topics related to a preliminary business plan:
Promoting accessibility (getting it and using it) supports the mission of the InVEST tool by expanding its user base and usefulness. Currently ArcVIEW runs a price tag of $1500-$3500 and is a general purpose GIS tool; taken together these restrictions eliminate many casual users. I suggest moving away from the ArcVIEW platform and exploring online or open source desktop alternatives. Below I list some suggestions ranked approximately by accessibility:
Large projects often use several different languages. Selection will depend on the direction the architecture takes. For example, partnering with the Google Earth Engine will likely require interfacing with Python and Java. Below I detail the likely language types that will be used in continuing the development of InVEST.
Java is a good language for large scale systems that focuses abstractions through interfaces and packages. The standard libraries have excellent data structure implementations. The language has abstraction mechanisms that support good designs. The compiler enforces design constraints and can help expose design errors early in development. If used correctly, this can yield good modular designs. It has an excellent built in documentation framework (JavaDoc) and is amenable to distributed computation if multiple CPUs or computers are available.
Often Java code is "wordier" than its Python equivalent. However, much of the extra syntax is meant to ensure code developed conforms to a global interface and helps to identity design errors at an early stage. Modern Integrated Development Environments (IDEs) like Eclipse help to automate much of the extraneous syntax.
As stated on the InVEST website, the primary users for the tool are "government officials, conservation professionals, farmers, and other land owners". It's likely many of these users are not familiar with GIS tools. Thus, to promote usability, the user interface must be approachable yet allow for advanced features for the experienced user.
The user base makes an excellent justification for a browser based tool. People who use computers can use a web browser. The range of possibilities given modern JavaScript libraries is large.
Controls need to be intuitive and meet the needs of the primary users. This must be assessed and developed by working with potential users of the tool. Defining use cases also help to realize what controls are necessary. For example, consider a farmer considering the costs of clearing a forested region of land for cultivating. Which section and how much should be guided based on likely gains in production contrasted with long term costs to the surrounding environment and inhabitants. Consider a potential use case:
A more advanced user may use the tool differently:
InVEST should support both casual and advanced users by making commonly used functions easily accessible, while still supporting advanced usage.
The image to the right shows an online historical map browser for the Lewis and Clark expedition. Its interface reflects this quality. Novices to experiment with the dataset while advanced functions are hidden in menus. The following references are also inspirational to user interface design:
Whether the focus is on a desktop or web based environment, the computational engine will be implemented in Python, Java, or C. All of these languages can work with existing C and Fortran libraries.
F2PY which allows developers
to build an interface to low level Fortran code.extern directive.As the lead developer for InVEST I would first spend time with the codebase to understand the structure of the current system. Simultaneously I will develop a professional software development workflow that focuses on producing high quality software. I highlight some main points below.
Thank you for your time. I'm happy to answer any questions.