Ask.Cyberinfrastructure

What packages are most efficient for image classification?

I’m trying to conduct anomaly detection of certain features, based on 4,000 JPEG images of human cells. What image-processing tools will be most efficient on my large data-sets?

I’m also investigating this for use in our cluster and here is what I found (keep in mind that I’m not sure how it works on XSEDE):

  • imageMagick is a general purpose program that can be used for a lot of image manipulation tasks. I wrote an example batch slurm script here for cropping to illustrate submitting an array job: https://github.com/rutgers-oarc/training/tree/master/slurm_examples/image_cropping
  • scikit-image is a python library that lets you do some image manipulation. Again, you would write a script that takes a single image as an input (or a small list of images) and then farm it out as an array job. There may be some throttling on XSEDE insuring that you don’t flood the system with your jobs, so will only let you submit some number of tasks at once. Here are some options https://slurm.schedmd.com/job_array.html and specifically A maximum number of simultaneously running tasks from the job array may be specified using a "%" separator. For example "--array=0-15%4" will limit the number of simultaneously running tasks from this job array to 4.
  • openCV is another set of tools, very general and powerful, and installable as a python package

Note: to install a python package only for the user in a shared cluster you would do pip install --user mypythonpackage.