Vision and mechatronics applications at the NCEA
J Billingsley, M Schoenfisch
National Centre for Engineering in Agriculture
University of Southern Queensland
Abstract
Vision projects include the grading and processing of broccoli and the quality assessment of nut kernels. Work on an agricultural vision guidance system has had a very successful outcome, with six 'Steeroid' systems having successfully completed field trials in Australia and further prototypes undergoing evaluation in the United States. The Steeroid steers by sight of the rows of crops and relieves the driver of the stressful task of maintaining accurate location with respect to the plants. This paper outlines some of the background fundamentals of the research and the factors which integrate the projects.
Introduction
The chronological history of the initial stages of the vision guidance project has already been reported [1, 2]. This paper seeks to expand on salient features of the present stage of development and to relate the methods to those used by previous research teams.
Many additional features of the system are not directly related to the performance and accuracy of the crop-following operation but concern the operator interface. A feature which has been given some importance is an in-cab display on which live images of the camera view are shown with superimposed graphics representing the data extracted by the computer.
It is fully acknowledged that the performance on which the system will be judged includes the farmer as part of the overall system. This involves calibration at the start of a field and the clearance to which cultivator tines can be set to match the system's precision. The operator must have general trust in its operation - but must realise and anticipate circumstances in which automatic operation becomes unreasonable.
Ability to discriminate plant rows
The brightness of the image is captured in a two-dimensional array of eight-bit values. It is necessary to discriminate in some way between the crop and the background field, a task made harder by light levels which can change from moment to moment. However some earlier researchers allowed this task to get out of hand.
They argued that the pixel brightnesses would have a bimodal distribution, peaking at values of plant brightness and soil brightness respectively. A threshold level could be set midway between these to discriminate between rows and gaps. They were thus faced with a massive information processing task to perform for each frame of data before analysis could even begin.
We have found a very much simpler approach to be successful. This starts with the premise that 'viewports' will be located to straddle each of the rows under scrutiny. The approach also depends on the concept of frame-to-frame adjustment of the threshold, so that data extracted as a by product of other computations can be used for incremental control of the next frame's threshold.
Within the viewport, a count is made of the pixels which exceed the threshold and are therefore deemed to be 'plant'. The proportion of bright to total pixels is therefore known. The 'correct' value of this proportion is a property of the plants' stage of development, varying from, say, 0.1 or less when the plants are newly emerged to 0.5 or more as the canopy closes. (Above this value the farmer would not wish to enter the crop with a vehicle unless harvesting it.)
If the proportion of bright pixels is seen to fall below the target value, the threshold is according lowered by one count for the following frame. The target value itself is held in the computer as an adjustable parameter. It is increased or decreased by the farmer during start-of-field setup in a very simple way. Alternate rows of the monitor image show the camera view and an image quantised by the threshold. Black and white cells are seen in the bars straddling the crop rows. By tapping either of two buttons, the farmer can raise or lower the target value and cause the white patches to widen or shrink until they match the appearance of the crop-row thickness.
Until the target value is changed again, the threshold will adapt to variations in picture brightness to preserve the proportion of bright cells in the window. Because of the 'fuzzy' boundaries of the crop rows, there is a great deal of latitude in this setting before performance starts to deteriorate.
Location of the rows
Even when the image is quantised into a binary set of 'plant' and 'background' points the data set is still very large. A number of previous researchers [4, 5] have chosen to try run-length encoding to analyse the image in terms of its horizontal structure, storing a coordinate for each change from light to dark or vice versa. These are then paired off and averages taken to obtain estimates of the row centres.
This certainly reduces the amount of data to be processed, but a new problem has to be faced. The rth centre does not necessarily relate to row r, since a 'glitch' earlier in the data can throw in another pair of transitions and the count is corrupted. The researchers have to look for transitions 'in the locality' if they wish to put together a sequence of transitions on successive lines of the image. Having found this sequence in values of x against y, a 'line of best fit' can be matched by conventional regression.
Once again our own approach has been much simpler. What we require is simply the lateral displacement of the image of the row within the viewport, so that for the following frame the viewport can be displaced to track the row and so that a steering signal can be derived from that displacement. Instead of extracting edge data from the image, all that is necessary is to 'weigh' it. The computed moment of the bright data points about the viewpoint centre enables the 'centre of gravity' to be calculated. (The computation can be performed equally well without quantisation but it was found that the quantised performance appeared to be as effective and was faster to compute.)
To give more flexibility to the system for curved rows or large deflections, the analysis also includes an estimate of the angle of tilt of the row segment. Within the window, the computation finds not merely the 'centre of gravity' of the distributed points but also their 'axis of gyration'. This enables the window to be corrected by lateral shear so that it best conforms to the rows.
As a by product of the computation, the moment of inertia about the sheared row-estimate is also known. It can be determined whether points are tightly bunched about the line of fit or scattered widely. The result is an estimate of the 'quality' of each observation in terms of its credibility as a clearly seen row. Only if the quality exceeded a preset threshold is the correction made to the viewport or the data used for steering. A count is made of the number of unusable row images in succession and if this exceeds ten then an audible warning is given and control reverts to manual.
Since three row-viewports are processed in each image and since frames are captured at a rate of at least ten per second, only a third of a second of 'eyes shut, straight ahead' travel occurs before the warning.
Steering the vehicle
Earlier researchers [4, 5] were preoccupied with the identification of coordinate transformations relating the origin of the perceived image to absolute field coordinates.
In the Steeroid system, the demanded steering angle is simply made proportional to the perceived lateral displacement of the rows. Since the viewports represent a focus some metres ahead of the vehicle, this signal will be the sum both of a term representing the lateral displacement of the vehicle and a term representing its angular heading relative to the row. This mix provides substantial 'damping' of the response, which will effect an exponential decay of error with a 'distance constant' equal to the distance from the focus of view to the rear axle.
Calibration
All important parameters are held on computer disk within the system, loaded automatically at start up. Some of these are intrinsic within the system and some represent user settings.
The datum of the steering sensor can be fine-tuned by noting the steering feedback signal when the tractor has settled on course. The horizon, vanishing point and row separation are also important parameters of the analysis. They are set by the simple expedient of asking the operator to tap 'cursor' keys until reference lines are aligned with the image on the screen. Steering is then performed in terms of deviation of the image from this datum.
As well as setting the target density of the image, the operator can select any combination of the two chrominance signals in addition to the luminance. In particular, it is possible to select for greenness so that the analysis of the image is not corrupted by dry vegetable trash.
The viewports can be raised and lowered in the image, changing the distance of the focus ahead of the vehicle. There will also often be tanks attached to the front of the tractor which obstruct part of the view. A portion of the wide-angle camera-view can also be selected, giving the effect of pan and tilt controls.
Objective measurement of steering performance
The real test of the system is, of course, the accuracy with which it follows the crop-rows and the nature of its recovery from a transient disturbance. Figure 1 shows the actual performance of the entire vehicle on a test run. Conditions were far from ideal. Wind was blowing up dust clouds from the marked-out track and the powdered-lime markings were already broken and irregular. A new target was laid in the form of a length of stretched tape 15 millimetres in width. Rather than try to persuade it to lie flat, the tape was twisted with the result that the image quality fluctuated with a cycle of about one metre.
Figure 1. Practical results of a 35 second run with a tractor speed of 1 metre per second. The deflections are measured in centimetres, taken from an independent video camera record.
A second camera was mounted on the front axle, looking down on the tape. The 'record' button was pressed and the tractor was driven along the tape at approximately one metre per second under automatic vision guidance. At the end of the run a ruler was briefly placed in the field of view to calibrate this second camera.
The video was played back in the laboratory through a duplicate of the guidance computer system. This now tracked the image of the target tape in the recording, logging its findings to disk. Over three hundred readings were acquired in real time during the thirty-five seconds of the run. These are presented here, untouched apart from scaling so that the y-axis represents centimetres of deflection from the line.
A few 'glitches' have been produced in the analysis process but figure 1 shows a response which meets the requirements of two-centimetre accuracy. There is seen to be an initial error when the system first acquires the line. This decays with a 'distance constant' seen to be in the region of ten metres, of the order of the distance of the view focus ahead of the rear axis. For the remaining twenty metres the deviation from the line does not exceed two centimetres.
In field measurements, farmers have expressed great satisfaction with the performance. Fluctuations in path are no more than the fluctuations in the row crop and the system does indeed relieve the driver of a great deal of stress.
Related vision projects
Broccoli heads must be sorted at a rate of five heads per second, so that they can be packed for the appropriate markets. There is also a need to trim the stems to a variety of lengths according to the country of destination.
Classification is made according to size, shape, colour and length of stem. For the choice markets, the top view of the head must be well rounded with no missing or deformed florets while the side-view must reveal a sufficient 'dome'. There must be no patches of yellowing representing over ripeness and the stem must be sufficiently long to allow trimming to the desired length.
Rather than direct massive computing power at the task, we have succeeded in using substantially the same image analysis hardware that is used in the tractor guidance project.
To gain speed, size is assessed by measuring the centre-line dimensions of the head, rather than by counting pixels over its area. If the head is badly deformed, this will of course give an erroneous measure. But if the head is deformed it is destined for a lower-value market where accurate sizing is not so important. It is therefore important that the quality of shape can be assessed.
After measuring three axes, accessing only a very small proportion of the image pixels, the centre coordinates are known together with the size. Two rings of points are now defined, one of which will lie within the image of a circular head and the other of which should lie completely outside it. Examination of the pixels at these points reveals a count of 'bumps' and 'dents' which represent irregularity of the shape.
Similar pragmatic measurements will arrive at a data-set consisting of size, oval, lumps, dents, dome and percentage of yellow. This last is computationally costly, since a large number of pixels have to be accessed to ensure good coverage. How, then, is the farmer going to relate these to a grading operation?
A setup operation allows twenty or more rules to be established. Such a rule might be:
Grade 1 if oval <= 10% and lumps <=2 and dents <=2 and dome >=20% and yellow <= 0%.
The rules are arranged in decreasing quality, so that the first rule activated sets the grade. Now the head is classified by grade, size and adequate stem. A second table sets the destination for each combination of these measurements - a variety of combinations can, if desired, be sent to the same processing destination.
The operator's interface is seen as being most important. All functions are mouse-driven on a screen which shows a 'computer's eye view' of the head. In normal operation, lines are shown as a box around the measured head. A keypress allows the system to operate in diagnostic mode, where lines and spots show the results of pixel interrogation and where the active rule appears on the screen.
Most important is the ability to 'tweak' the thresholds to meet the market demands. If shape requirements must be relaxed to meet a delivery quota, the separation of the two rings can be increased. Size thresholds can also be adjusted, both to adapt to changing criteria and to compensate for any change in camera zoom.
A similar pragmatic approach has given promising results for the analysis of seedling shape with a view to micro propagation.
A project which calls for a different approach is the detection of blemishes on nut kernels. A stream of kernels moves past the viewing head at speeds of one or two metres per second. In this case the image is captured by a line-scan camera and analysis proceeds at millisecond intervals as the kernel is in flight.
Mechatronic supporting technology
The vision work is not performed in isolation. The aim of each project is a complete, working solution close to commercial exploitation. For the tractor guidance project it was necessary to add a sensor to measure steering angle to support the hydraulic control loop. The unsuitability of the available transducers led us to develop our own sensor, very simple and robust, immune to high dust-levels and easy to interface to the microcomputer control loop. These sensors have now found a variety of new applications ranging from tactile guidance in cotton-picking to level sensing in anhydrous ammonia.
Some other projects are based on pneumatic actuation and we have developed the 'floating-plate' valve which allows proportional flow, resulting in excellent position control. A light-weight agile manipulator arm is being developed which will use technology compatible with in-field picking operations. It will exploit programmable compliance to switch between the modes involved in feeling for stems and plucking the produce.
Conclusions
Teaching and research in machine vision have a tendency to emphasise such aspects as image enhancement, filtering, edge-detection, object recognition and template matching. All of these are computationally intensive, yet contribute very little to the extraction of salient data in cases such as these. Indeed it could be said that their use would be an impediment.
By taking a pragmatic approach, concentrating on accessing only sparse image data, it has been possible to achieve a performance which is suitable for each task. Although the vision guidance results presented here were captured at a speed of only one metre per second, the system performs well at twenty-five kilometres per hour - faster than most agricultural operations.
References
1. Billingsley J, Schoenfisch M, A Vision-guided Agricultural Tractor, Australian Robot Association Conference, Robots for Competitive Industries, Brisbane, July 14-16 1993.
2. J Billingsley, M Schoenfisch, Vision Systems in Agriculture, Proc Mechatronics and Machine Vision in Practice, Toowoomba Australia, September 13-15 1994, pp 16-21.
3. Reid JF, Searcey SW, An Algorithm for separating guidance information from row crop images Trans ASEA 1988 pp 1624-1632
4. Reid JF, Searcy SW, Automatic tractor guidance with computer vision. Proc Int Off-Highway & Powerplant Congress, Milwaukee Sept 14-17 1987.
5. Gerrish JB, Stockman GC, Image processing for path finding in agricultural field operation, ASAE Summer Meeting June 1985, paper ASAE 85-3037.