You are on page 1of 33

CS221 Artificial Intelligence: Principles & Techniques

Challenge Problem Object Recognition and Tracking

Overview

Challenge problem

Problem statement Source code overview Sliding-window detectors and Haar features Milestone requirements

OpenCV tutorial

Introduction and installation Code samples

Object recognition tips and tricks

Challenge Problem

Challenge Problem

Important Dates

team

milestone

final submission

Source Code Overview


classifier.cpp Defines the CClassifier class. You are free to modify this file, but do not modify the interface to the loadState() and run() methods. Class for parsing XML files. Used for replaying ground truth object labels. Contains the data structures for annotated objects. Do not modify this file. Contains code for replaying object labels (such as ground truth labels). Do not modify this file. Contains main() code for testing classifiers on videos. Do not modify this file. Contains training starter code. You are free to modify everything in this file. Contains useful utility functions.

CXMLParser.cpp objects.cpp replay.cpp test.cpp train.cpp utils.cpp

Command-line Options

train [<options>] <directory>


<directory> is the root directory containing (subdirectories of) all the training images -c <filename> writes learned parameters to a file after training using CClassifier::saveState() -h provides help -v gives verbose output

test [<options>] <AVI filename>


<AVI filename> is the name of the video you want to test on (e.g. easy.avi) -c <filename> configures the classifier with parameters from a file using CClassifier::loadState() -g <filename> displays ground truth labels from an XML file -h provides help -o <filename> saves classifications to an XML file (same format as g) -v gives verbose output -x disables display of the video (if you dont have X-windows)

Training File Lists


typedef struct _TTrainingFile { std::string filename; std::string label; } TTrainingFile; typedef struct _TTrainingFileList { std::vector<TTrainingFile> files; std::vector<std::string> classes; } TTrainingFileList; // full path to image file // subdirectory name

// list of files // list of classes (subdirectories)

code/ data/ mug/ other/

CObject Class
class CObject { public: CvRect rect; std::string label;

// object's bounding box (x,y,width,height) // object's class

public: // constructors CObject(); CObject(const CObject&); CObject(const CvRect&, const std::string&); // destructor virtual ~CObject(); // helper functions void writeAsXML(std::ostream&); void draw(IplImage *, CvScalar, CvFont *); CvRect intersect(const CObject&); int overlap(const CObject&); // operators CObject& operator=(const CObject&); };

CClassifier Class
class CClassifier { protected: CvRNG rng; CvMat *parameters; // TO DO: ADD YOUR MEMBER VARIABLES HERE public: // constructors and destructors CClassifier(); virtual ~CClassifier(); // load and save classifier configuration virtual bool loadState(const char *); virtual bool saveState(const char *); // run the classifier over a single frame virtual bool run(const IplImage *, CObjectList *);

// train the classifier using given set of files virtual bool train(TTrainingFileList&);
protected: // TO DO: ADD YOUR MEMBER FUNCTIONS HERE };

Overview

Challenge problem

Problem statement Source code overview Sliding-window detectors and Haar features Milestone requirements

OpenCV tutorial

Introduction and installation Code samples

Object recognition tips and tricks

Sliding-window Object Detectors

e.g. task: find all coffee cups

Sliding-window Object Detectors

e.g. task: find all coffee cups

Haar Features

Compute difference of intensity over image regions

Milestone Requirements

Build a decision tree classifier for the mug class using Haar features

Load positive and negative training images Convert to grayscale Resize to 64-by-64 Extract (given list of) Haar features Train decision tree Implement runtime code to run classifier over all scales (you can assume height = width for the milestone) and shifts (in increments of 8 pixels) within each video frame

Remember: after the milestone you are free to use whatever features and classifiers you like

Overview

Challenge problem

Problem statement Source code overview Sliding-window detectors and Haar features Milestone requirements

OpenCV tutorial

Introduction and installation Code samples

Object recognition tips and tricks

What is OpenCV?

The Open Computer Vision Library is a collection of algorithms and sample code for various computer vision problems:

libcxcore: core data structures and linear algebra library libcv: computer vision library libhighgui: media and graphics i/o handling libml: machine learning library (decision trees, boosting, neural networks)

Originally developed by Intel; now supported by Willow Garage Wiki has lots of information and API documentation

http://opencvlibrary.sourceforge.net/

Installing OpenCV (Linux)

Install ffmpeg

svn checkout svn://svn.mplayerhq.hu/ffmpeg/trunk ffmpeg ./configure --prefix=<home> --enable-shared make && make install

Install opencv

download (version 1.0.0) from:

http://sourceforge.net/projects/opencvlibrary/

tar -xvf opencv-1.0.0.tar.gz ./configure --prefix=<home> \ CPPFLAGS="-I<home>/include" \ LDFLAGS="-L<home>/lib" make && make install

Example 1: Loading and Displaying Images


#include "cv.h" #include "cxcore.h" #include "highgui.h" #define WINDOW_NAME "MyWindow" int main(int argc, char *argv[]) { IplImage *image; cvNamedWindow(WINDOW_NAME, CV_WINDOW_AUTOSIZE); for (int i = 1; i <= argc; i++) { image = cvLoadImage(argv[i], 0); // load from file cvShowImage(WINDOW_NAME, image); // display on screen cvWaitKey(0); // wait for key press cvReleaseImage(&image); // free memory } cvDestroyWindow(WINDOW_NAME); return 0; }

Example 2: Converting from Color to Grayscale


IplImage *image; // acquire RGB color image somehow ... // allocate memory for grayscale image IplImage *gray = cvCreateImage( cvGetSize(image), // same size as original image IPL_DEPTH_8U, // data type (8-bit unsigned) 1); // grayscale has one channel // color convert the image (source, destination) cvCvtColor(image, gray, CV_BGR2GRAY); // do something with greyscale image ... // free memory used by images cvReleaseImage(&gray); cvReleaseImage(&image);

Example 3: Resizing an Image


IplImage *image; // acquire image somehow ... // allocate memory for resized image IplImage *resizedImage = cvCreateImage( cvSize(64, 64), // new size (width, height) image->depth, // data type (e.g. 8-bit unsigned) image->nChannels); // number of planes (e.g. RGB) // resize the image (source, destination) cvResize(image, resizedImage); // do something with resized image ... // free memory used by images cvReleaseImage(&resizedImage); cvReleaseImage(&image);

Example 4: Clipping a Small Region out of an Image


IplImage *image; ... // acquire image somehow // clip out 64-by-64 image patch at (4,8) CvRect region = cvRect(4, 8, 64, 64); IplImage *clippedImage = cvCreateImage( cvSize(region.width, region.height), image->depth, image->nChannels); cvSetImageROI(image, region); cvCopyImage(image, clippedImage); cvResetImageROI(image); ... // do something with clipped region // and always free memory cvReleaseImage(&clippedRegion); cvReleaseImage(&image);

Example 5: Computing an Integral Image


IplImage *image; CvRect r; ... // compute integral image on grayscale image IplImage *iImage = cvCreateImage( cvSize(image->width + 1, image->height + 1), IPL_DEPTH_32S, 1); cvIntegral(image, iImage); ... // compute sum of pixels in area r (could also use CV_IMAGE_ELEM) double value; value = cvGetReal2D(iImage, r.y, r.x); value += cvGetReal2D(iImage, r.y + r.height, r.x + r.width); value -= cvGetReal2D(iImage, r.y, r.x + r.width); value -= cvGetReal2D(iImage, r.y + r.height, r.x);

Example 6: Logistic Regression


y
(i )

1 1 exp T x ( i )

CvMat *logistic(const CvMat *X, const CvMat *theta) { assert(X->cols == theta->rows); CvMat *Y = cvCreateMat(X->rows, 1, CV_32FC1); for (int i = 0; i < X->rows; i++) { double sigma = 0.0; for (int j = 0; j < X->cols; j++) { sigma += cvmGet(X, i, j) * cvmGet(theta, j, 0); } cvmSet(Y, i, 0, 1.0 / (1.0 + exp(-1.0 * sigma)); } return Y; } // caller must free Y with cvReleaseMat

Example 7: Boosting (Train)


CvMat *data = cvCreateMat(, , CV_32FC1); CvMat *labels = cvCreateMat(, , CV_32SC1); ... // acquire training data somehow //define variable types CvMat *varType = cvCreateMat(data->width + 1, 1, CV_8UC1); for (int j = 0; j < data->width; j++) CV_MAT_ELEM(*varType, unsigned char, j, 0) = CV_VAR_NUMERICAL; CV_MAT_ELEM(*varType, unsigned char, data->width, 0) = CV_VAR_CATEGORICAL; // train CvBoostParams parameters(CvBoost::GENTLE, numRounds, 0.95, numSplits, false, NULL); parameters.split_criteria = CvBoost::DEFAULT; CvBoost model = new CvBoost(); models->train(data, CV_ROW_SAMPLE, labels, NULL, NULL, varType, NULL, parameters); // free memory ...

Example 7: Boosting (Test)


CvBoost *model; CvMat *x = cvCreateMat(1, , CV_32FC1); ... // acquire model and test sample somehow // allocate memory for weak learner output int length = cvSliceLength(CV_WHOLE_SEQ, model>get_weak_predictors()); CvMat *weakResponses = cvCreateMat(length, 1, CV_32FC1); // test (y is prediction, score is log-probability) int y = model->predict(x, NULL, weakResponses, CV_WHOLE_SEQ); double score = cvSum(weakResponses).val[0]; // free memory cvReleaseMat(&weakResponses); ...

Example 8: Optical Flow


IplImage *bwImg1, *bwImg2; ... // acquire images somehow // allocate memory for optical flow vectors CvMat *dx = cvCreateMat(bwImg1.height, bwImg1.width, CV_32FC1); CvMat *dy = cvCreateMat(bwImg1.height, bwImg1.width, CV_32FC1); // compute dense Lucus-Kanade optical flow (previous, current) // (also see cvCalcOpticalFlowPyrLK for sparse optical flow) cvCalcOpticalFlowLK(bwImg1, bwImg2, cvSize(5, 5), dx, dy); double deltaX = cvAvg(dx).val[0]; // bulk x-direction motion double deltaY = cvAvg(dy).val[0]; // bulk y-direction motion // free memory cvReleaseMat(&dy); cvReleaseMat(&dx); cvReleaseImage(&bwImg2); cvReleaseImage(&bwImg1);

Overview

Challenge problem

Problem statement Source code overview Sliding-window detectors and Haar features Milestone requirements

OpenCV tutorial

Introduction and installation Code samples

Object recognition tips and tricks

OpenCV Tips

Dont forget to free allocated memory

cvReleaseImage, cvReleaseMat

Try to allocate and free memory outside of loops if possible Image size is width-by-height; matrix size is rows-by-columns

Never set the region of interest (ROI) outside of the image/matrix

x 0, y 0, x + width < imagewidth, y + height < imageheight

Never operate on images of different types (imagedepth)

Make sure you convert first (e.g. RGB to grayscale)

Check the return values (for NULL)

e.g., loading images and allocating images

Object Recognition Tips

Read some papers for ideas on better features

[Viola and Jones, 2001], [Wolf, Serre and Poggio, 2004], [Lowe, 2004], [Dalal and Triggs, 2005], [Torralba, Murphy, and Freeman, 2007].

The milestone uses sliding-window based detector, but there are other techniques that you can try

template matching, chamfer matching/shape matching, bag-of-features with locality constraints, eigenspace representation, scene context, color

Try different classifiers and training methods

logistic classifiers, support vector machines, boosted classifiers

Try to normalize your features for intensity variation Filter the output of your classifiers and use motion estimation to predict where an object in frame n will be in frame n+1 Anything that works!

Coding Tips

Modular, general design

allows you to test lots of things

Use OpenCV and other libraries, e.g., GSL, SVMLight, STAIR Vision Library (SVL)

make sure you cite external libraries

Use source control (SVN or CVS)

Your code must compile on CS machines

Design Process Tips


Automate testing early Consider how you will avoid overfitting Test features without training an entire classifier (e.g., entropy function in Matlab) Visualization code usually pays off Profile your code and optimize bottlenecks (e.g., use integral images) Start early!

Example Results

A simple decision tree classifier trained using Haar features is not all that good so dont expect brilliant results

Milestone Results

State-of-the-art Results (courtesy Ben Sapp)

Finally

Good luck! (and have fun)

You might also like