Design and Development of Robot Arm System for Classification and Sorting Using Machine Vision

in the sorting process. The system can be used in the industrial process to reduce the required time to achieve the task of the production line, leading to improve the performance of the production line.

robot arms for classifying and sorting objects based on shape and size using machine vision.The system uses a low-cost and high-performance hierarchical control system including one master and two slaves.Each slave is a robot controller based on a microcontroller that receives commands from the master to control the robot arm independently.The master is an embedded computer used for image processing, kinematic calculations, and communication.A simple and efficient image processing algorithm is proposed that can be implemented in real-time, helping to shorten the time of the sorting process.The proposed method uses a series of algorithms including contour finding, border extraction, centroid algorithm, and shape threshold to recognize objects and eliminate noise.The 3D coordinates of objects are estimated just by solving a linear equation system.Movements of the robot's joints are planned to follow a trapezoidal profile with the acceleration/deceleration phase, thus helping the robots move smoothly and reduce vibration.Experimental evaluation reveals the effectiveness and accuracy of the robotic vision system in the sorting process.The system can be used in the industrial process to reduce the required time to achieve the task of the production line, leading to improve the performance of the production line.

INTRODUCTION
Machine vision has become a critical component for many robot systems.The integration of vision technology has brought a variety of values such as improving accuracy, flexibility, and productivity, reducing labor costs and product damage, and expanding the application domain of robotics.A machine vision system consists of several essential components, which include vision sensors to interact with the environment, processing mechanism, and communication.The sensor in a machine vision is one or more cameras that capture images of objects and relay them to the processor for analysis.Vision processing employs algorithms to extract the required information, run the required inspection, and make a decision.Finally, the results are communicated to another device that logs or uses the information.
A vision system in the robotic system acquires and analyzes the image to extract the necessary information such as color and geometry of objects, segmentation of objects of interest, depth information, 3D coordinates of objects, … [1].This information is used in the control process performed by a robot controller to control the movement of the robot and other components such as the sensors, motors, etc … The robotic vision system is used in a variety of commercial and industrial applications, such as material inspection, object recognition, pattern recognition, assembly and disassembly, robot localization, vision guide robot, mapping, navigation, tracking, path planning, exploration, surveillance, search, recognition, inspection, … [2,26,27].
This paper develops a robotic vision system to automatically classify and sort objects based on their shape and size.The system consists of two robot arms for grasping objects, a conveyor belt for transporting objects, a camera for capturing an image of objects.Each robot is controlled by a microcontroller.The image processing, kinematics solving, and communication are implemented on an embedded computer.A series of image processing algorithms including contour finding, border extraction, centroid algorithm, and shape threshold are developed to detect, classify objects and find their position.

RELATED WORKS
In [3], Zhang et al. develop a machine vision system to automatically sort cherry tomatoes according to maturity.Three images of different angles are obtained from each cherry tomato and nine features were extracted from each image.Tomatoes are classified into three categories (unripe, half-ripe, and ripe) by using principal component analysis (PCA) and linear discrimination analysis (LDA) to analyze the features.Omid et al. [4] construct an experimental sorting system equipped with machine vision to sort tomatoes according to four quality criteria: maturity (color), defects, shape (oblong and circular), and size (small and large).
The software developed in this study evaluates tomato shape, size, maturity, and defect by its eccentricity, 2-D image area, mean color, and fullness parameter, respectively.An automatic apple sorting and quality inspection system is designed by Sofu et al. [5].Their system consists of two identical industrial color cameras that are set on the roller conveyor to capture four images of any apple rolling on the conveyor.The images are analyzed to sort apples into different classes by their color, size and detect defective regions of the apples.The system also uses a load cell to measure the weight of apples.The proposed machine can sort an averagely of 432.000 apples per day with 79 sorting accuracy scores.Rafael et.al [6] develop a portable device based on a computer vision system for the automatic evaluation of green table olive quality in the field.The system consists of an illuminated cube that acquires images of fruit samples and generates an instantaneous report table.The external parameters of the report table consist of width, height, weight, color, Maturity Index, Bruising area, and Bruising Index.
A machine vision system can be used in the automatic counting system to automatically estimate the number of objects in a target area.It has been widely used in many fields [7][8] [9].In [10], Zhang et al. propose a fish counting method based on image density grading and local regression.In this paper, fish top-view images are divided into several connected area subimages.Each sub-image is graded into different density levels by an area threshold and a backpropagation neural network (BPNN)-based regression model is constructed for each density-level dataset to count the number of fish.The experiment results show that the proposed method achieves a mean absolute error of 0.2985, a root means the square error of 0.6105, and a coefficient of determination of 0.9607.Tian et al. [11] propose a modified version of Counting Convolutional Neural Network in a fashion of end-to-end as a homogeneous, multi-branch architecture for pig counting.They combine both Counting CNN and ResNeXt in their deep learning architecture and tune a series of experimental parameters.The dataset used to train the CNN model is obtained from multiple websites and also captured from a real farm.After training is done, image patches are mapped to the corresponding density map and obtain the total number of pigs in the entire image by integrating the density map.Liping [12] proposes a novel end-to-end architecture based on Multi-Scale Adversarial Convolutional Neural Network (MSA-CNN) to generate crowd density and estimate the number of pedestrians in crowd images.The multicolumn is used to extract high-dimensional features of the crowd image, and then a series of fractionallystrided convolutional layers is used to restore the detail of image features caused by max-pooling layers so that to improve the quality of the density map.
A robotic system equipped with a computer vision system can operate in an unstructured environment.The vision system recognizes the objects placed in the workspace and identifies the exact position of objects to lead the robot system.In [13], a robotic vision system that can operate in an unstructured environment is developed to recognize objects based on a highperformance Neural MUlticlassifier System (NEMUS).Various feature extraction methods (FEM) are applied to extract the feature sets as inputs for several classifiers.The outputs of all the classifiers are combined in a decision-making network (DM-Net) to perform the final classification task.The NEMUS is applied to a shape recognition task, under various levels of shape distortions and is also suitable for generic classification applications, such as shape discrimination, signal detection, and texture recognition.
A robotic vision system is presented in [14] to distinguish and sort object sort objects according to color and shape in real-time.A series of image processing techniques such as HSV threshold, shape properties, centroid algorithm, and border extraction is implemented to sort objects based on their color and shape.Then find the position of objects to pick and put the object on the right branch conveyor belt.Sangeetha et al. [15] use a stereo vision system to estimate the coordinate of targets and a three-DOF robotic arm is used to precisely position the target.
The kinematics algorithms and image processing are implemented in MATLAB 2012b and interface with NI6259 DAQ PCI card to control the movement of the robot arm.The results show that the arm reaches the target with a best-achieved accuracy of 2 cm.Also, in [16], a stereo vision system is developed to measure and predict the ball trajectory in real-time for a ping-pong robot.A multi-threshold segmentation algorithm is applied to detect the ball in the image.
The 3D position of the ball in the world coordinates is computed from two image coordinates by using the triangulation algorithm.Then, the flight trajectory of the ball is predicted using the aerodynamics model and rebound model.Aneesh et al. [17] design an efficient robot system that picks up the right colored and shaped objects and puts them down at the right place.The robot arm is controlled by a microcontroller.This system uses MATLAB for image processing to recognize the shape and uses a color sensor to recognize the color.
Tracking objects in real-time is becoming more and more important for some industrial tasks, such as grasping, sorting, and assembly, especially in a complex environment.The tracking is used in different fields as face tracking, color tracking, and shape tracking.The HSV spectrum is used to recognize objects based on shape and color to track a predefined object in real-time [18].A stereo vision system extracts 3-D coordinates and a multiagent robot system is used for tracking, tooling, or handling operations [19].
The industrial tracking system is designed to provide tracking and sorting for products based on the shape and reject the products with low quality [20].The robot manipulator can also track the trajectory using vision feedback [21].The desired image trajectory is defined by a series of images are recorded when the engineer grasps the object and Model-free feedback-assisted iterative learning control strategy is used for repetitive tracking [22].

Describe the operation of the system
This paper presents the design and implementation of a robotic vision system consisting of a conveyor to sort objects in real-time.Objects on the workspace are taken by a camera connected with a master controller.The master controller acquires and processes the image of objects to classification them depending on their size and shape.The master also determines the coordinates of objects in the workspace and converts them into joint coordinates by solving inverse kinematics equations.The system consists of two robot arms.The first robot receives joint angle values from the master to pick an object from the workspace and place it on the conveyor.The second robot controls the conveyor to transmit the object to sensor position, grasp and place the object in the right position.The movements of robots are pointto-point motion and the motions are planned by the robot controller.Figure 1 shows the basic components of the system and Figure 2 describes the steps of operation.For every motion of the robotic arm, the links 0, 5, and 8 are parallel to each other and thus keep the endeffector (link 8) always parallel to the horizontal.The robot has three degrees of freedom corresponding to the rotation angles θ 1 , θ 2 , and θ 3 .Three stepper motors placed at each joint are used to control movements of links 0, 1, and 2, these motions are planned to create the desired motion of the end-effector.Before planning the trajectory, it is necessary to determine the value of the joint angles by solving the kinematics problem.From basic trigonometry, the position of the end effector can be written in terms of the joint angles as follows: ( ) ( ) Equations ( 1), ( 2), and (3) are called forward kinematic equations of the robot manipulator that describe the relationship between the end-effector coordinates and joint angles.To find the joint angles for a given set of end-effector coordinates, we need to solve the inverse kinematic equations.
From equations ( 1) and ( 2), we easily obtain: ( ) Here we use the atan2 function to get the unique joint angle θ 1 .Square both sides in equations ( 1) and ( 2) then add them together: Combine with equation ( 3) and group the unknowns on the left-hand side: ( ) Now, we can obtain the angle θ = θ 3 -θ 2 : Rearrange the equation ( 7) according to the unknown angle θ 2 , we get: ( ) Define r and φ so that: The angle φ can be determined by using the atan2 function: Substituting r and φ into (10) we get: ( ) Finally, the solution of angle θ 2 is: There are four solutions for a given end-effector position.That means there are four configurations that the robot must choose to reach the desired position.The value of θ 1 is unique, the practical joint limits of joints 2 and 3 are used to get the unique configuration:  The control circuit of the system consists of one master circuit and two slave circuits as shown in Figure 4.The master controller is an embedded computer Raspberry Pi 4. This computer performs a variety of tasks that require high computational cost and process large amounts of data such as image processing, 3D localization, solving nonlinear kinematic equations, communication...Two slave circuits receive data from the master and control the movements of the robot and other components.Each slave is an Arduino board and is used as a simple robot controller.The Arduino board creates pulses and sends them to three A4998 drivers to drive stepper motors.In addition, the Arduino also outputs digital signals to control the solenoid valve, pump, and motor.Since the valve and pump operate at 12V, relays whose coils are energized by the 5V signal from the Arduino are used to turn ON and OFF them.The pump, solenoid valve, and vacuum suction cup are used in a vacuum system to grip and move objects.A proximity sensor is used to detect the object when it came to the picking up position.Arduino reads the signal from the proximity sensor to control the motor conveyor by outputting a signal to the L298 driver.

COMPUTER VISION SYSTEM
This section presents a method for shape and size classification and localization of objects with a simple algorithm and low computational time.The 3D coordinates of objects are estimated just by solving linear equations.

Image processing
Figure 5 shows the block diagram of the proposed method.The RGB image of objects taken by the camera is converted to a gray image and filters noise by a median filter.

Figure 5. Block diagram of the image processing
The median filter is used to reduce "salt and pepper" noise and smooth away the edges.The idea of a median filter is to replace a pixel with the median value of the pixels in the M×M neighborhood.This paper uses 9×9 matrices Then, the gray image is thresholded by Otsu's Binarization to extract the objects from their background.Otsu's Thresholding is an automatic global thresholding algorithm that selects a threshold automatically from a gray level histogram.The histogram image is separated into two clusters.The optimal threshold T is selected by the discriminant criterion.There are two options to find the threshold.The first is to minimize the within-class variance ( ) σ and the second is to maximize the between-class variance ( ) where w 1 (t), w 2 (t) are the probabilities of the two classes divided by a threshold , σ 1 , σ 2 are the variance and μ 1 ,μ 2 are the mean of each class.In Fig. 6, by using Otsu's Thresholding, the binary image clearly shows the differences between the object and background.The background is marked with zero value while the objects are marked with one.The morphological operators are applied to fill in small holes and eliminate small objects.Two basic morphological operators, dilation and erosion, are combined for specialized operations without changing the object size or shape.Firstly, a dilation followed by erosion is performed to fill holes in the objects while keeping the object sizes.Then, an erosion followed by dilation is applied to separate objects connected by a thin bridge of pixels and delete small noise objects.
The small red circles in Figure6 depict the "holes" and "thin bridge" in the input image.By implementing the morphological operators, these noise regions are eliminated.

Figure 7. Morphological operators
Lastly, we find the contours of objects in the binary image and extract different features of contours, like area, perimeter, centroid… These features are used to classify objects and localization.
The shape of an object is recognized by computing the compactness: where c is the compactness, p is the perimeter and is the area.The perimeter is calculated by summing all pixels on the contour of the object.The area is equal to the zeroth-order image moment defined by: ( ) where u and v are the row and column index, I(u,v) = 1 in the case of the binary image.
The centroid of the object in the image is given by the relations: The value of compactness and area A are used to classify objects.In this paper, objects are divided into four categories: small circle, large circle, small square, large square.The thresholds of c and A for classification are determined by experiment.

3D localization
After determining the centroid of the objects in the image according to equation (19).We can calculate the 3D coordinates of the object with the constraint that the height of the object is known in advance.
Define three coordinate frames F w , F r , and F c corresponding to the world coordinate frame, the robot coordinate frame, and the camera coordinate system.The world coordinate frame is fixed at a known location.The robot coordinate frame is attached to the base of the robot and is also known.The camera coordinate frame is attached to the camera.The pose of a coordinate frame F C relative to the world coordinate frame F w can be represented as a homogeneous transformation T = [R t].This homogeneous transformation is called the extrinsic parameters used to transform the world points to camera coordinates.The camera coordinates are mapped into the image plane using the intrinsic parameters: [ ] p K Rt P α = (20) where P = [X Y Z] T and p = [u v 1] T are the coordinates of one point in the world frame and the image plane, respectively, is a scale factor, K is the intrinsic matrix, R is the rotation matrix, t is the translation vector: The camera calibration will estimate the intrinsic and extrinsic parameters.Substituting ( 21) into (20): In this paper, the height of the object is fixed and known in advance and the centroid of the object in the image is calculated from the equation ( 19), so from equation (22), we can determine the coordinates X and Y of the object.Expand equation ( 22): ( ) ( ) The equations ( 23) are simplified: Rewrite in terms of unknown X and Y coordinates: Finally, X and Y coordinates can be easily obtained: The 3D coordinates of the object's centroid in the world frame are transformed to the robot frame using equation ( 27): where [x,y,z] T and [X, Y, Z] T are the coordinates of the object in the robot frame and the world frame, respectively, R is the rotation matrix, t is the translation vector, representing the relationship between the two frames.These coordinates are converted to the joint angles using the inverse kinematics in section 3.2 as follows: solve the angle θ 1 from equation ( 4).Calculate coefficients a, b according to equations ( 6) and (7).Then, using these coefficients to calculate the angles θ 2 and θ 3 from equations ( 9) and ( 14).

TRAJECTORY PLANNING AND STEPPER MOTOR CONTROL
Trajectory planning creates reference signals for the robot controller so that the robot can move in the desired trajectory.Trajectory planning can be done either in the joint space or in the Cartesian space [23].
Using Joint Space Trajectories has many advantages such as less computation, easier to plan trajectories in real-time, and no problem with singularities.For pick and place applications in industrial, joint space trajectories are usually used.
The planning algorithm generates a function q(t) interpolating the given vectors of joint variables at each joint.In industrial practice, a trapezoidal velocity profile is usually assigned (see Figure 8. a).The velocity graph consists of three phases namely constant acceleration, constant velocity, and constant deceleration.Assume that the angle q f from the initial position to the final position, the maximum speed ω max , and the constant acceleration/deceleration ω is given in advance.We need to determine the acceleration/deceleration time t c , the time in the constant velocity phase t v , the total time T, and the function of q(t).
The velocity at the end of the acceleration phase is equal to the constant velocity, so: And the angle after the acceleration is: The area of the trapezoid is equal to the total angle q f , so: Therefore, the time of the constant velocity phase is: ω ω > , the velocity profile is a trapezoid, the total time is: The trajectory is formed by a linear segment connected by two parabolic segments: , the velocity profile is a triangle that only consists of acceleration and deceleration (see Figure 8.b).The trajectory is formed by two parabolic segments, we have: Therefore, the time of the acceleration/deceleration and total time is: The maximum velocity in this case is: The function of angle in the term of time t: ( ) ( ) A stepper motor is controlled by sending pulses to the motor driver.One pulse makes the motor rotate one constant step angle α.The change of speed is achieved by changing the time interval between successive steps.It is difficult to generate pulses if the velocity is variable because the time interval between two adjacent pulses is changed.Consider the constant acceleration phase, the joint angle for the nth step pulse is: where n ≥ 0 is the step number, t n is the time for the nth step pulse.The time interval between two adjacent pulses is: It can be seen the time interval δt n between two adjacent pulses is not linear and complex to calculate in real-time (calculating two square roots is timeconsuming) for a mid-range microcontroller.Therefore, we use an approximation with less computational complexity (implemented by D. Austin [24]): Motor step signals are generated by a 16-bit timer /counter module in the Arduino running at the frequency f.The delay δt programmed by the counter c is: Substituting into (39): This approximation introduces an error of 0.44 at n=1.There are two ways to compensate for this error: multiplying c 0 with 0.676 or using c 1 = 0.4056c 0 [23].
We can calculate the number of steps on acceleration phase by dividing the angle by step angle: where t c is determined by equation ( 28) or (35).The acceleration stops when the number of steps n is equal to N c .After that, the constant velocity phase is started.The timer delay in this phase is constant, so the value of the counter is: The stepper motor is kept at constant speed until the number of pulses reaches the value: Finally, deceleration starts.Equation ( 46) can be used to ramp the speed down to zero in the final step of a move of N c steps (D.Austin [24]): ( )

RESULTS AND DISCUSSIONS
The mechanical models of the system are designed using the Autodesk Inventor software (see Figure 9).Separate parts of the models are saved in the stereolithography (STL) file format.Then the parts are fabricated by a 3D printer.Finally, the 3D printed components are assembled and with other parts (e.g., motors, bearings, shafts, aluminum frame, …) to produce a complete system as shown in Figure 10.The kinematic dimensions of the robotic arm are as follows: l 1 = 153.3mm, l 2 = 135 mm, l 3 = 160 mm, l 4 = 45 mm, d = 25 mm.The joints of the robot arm are driven by 5.17:1 planetary gearbox stepper motors.The stepper motor can provide a maximum holding torque of 0.25 Nm, resulting in a maximum robot's payload of 500g.In addition, the motors use a maximum of only 10W of power each, and three motors combined use a maximum of only 30W, resulting in significant energy savings.Table 1 shows the specifications of the robotic arm.
The vision system uses a Raspberry Pi Camera Module with a Sony IMX219 8-megapixel sensor.This camera has a focal length of 3.04mm, an angle of view (diagonal) of 62.2 degrees, and a resolution of up to 3280 x 2464 pixels.In the project, we only use images with a resolution of 640x480 pixels to achieve faster processing speed.
Objects are classified into four types: small circle (sc), large circle (lc), small square (ss), large square (ls).By experiment, the thresholds of c and A to classify objects are as follows: The proposed method is tested on a database consisting of 100 images.This database can be divided into five groups which are dataset that contains only one object, two objects, three objects, four objects, multiple objects with noise objects.Figure 12 illustrates the categories in the dataset and Table 2 shows the corresponding results.Figure 13 shows the example of successful detection and classification by using the proposed method.In the resulting image, the objects are labeled according to their shape and size.Figure 14 demonstrates that the proposed method can remove noise objects.The noise objects have the desired shape but the size is different from the object of interest.Figure 15 shows a case of incorrect detection.The reasons for the faulty recognition may be due to lighting conditions, resulting in the image is not thresholded properly or some objects being very close to each other that cannot be separated by using the morphological operators.
After detecting the object, the 2D centroid of the object is calculated and the 3D coordinates are estimated using the equations in section 4.2.From the above image dataset, the 3D coordinates of the objects are calculated and compared to the coordinates measured directly using a ruler.Then, the errors are extracted.The results show that the estimation method based on vision has an average error of 3.47 mm.Table 3 shows the first 20 results.As soon as the analysis of the image is completed, the coordinates are sent to the first robot arm to pick and place the object on the conveyor, and at the same time, the type of the object is also sent to the second robot.After the first robot arm completes the pick and place operation, the next image is requested to continue the process, and the robot backs to the initial position, waiting for the new command from the master.The conveyor transports the object to the sensor position.Then the conveyor stops, the second robot picks and places the object in the right position according to the command received from the master.
During the transporting and sorting of the object by the second robot, the processing on the master is also performed.So, the time of the process is minimal.The system needs a total time of 2.2 to 2.4s to achieve the sorting of one object.This time is also equal to the time it takes the first robot to pick and place the object on the conveyor because the time to transport and sort the object by the second robot is only about 2.1s.
Figure 16 shows the rotation angles of the robot's joints when applying the trapezoidal velocity profile.The three joints have the same acceleration and maximum speed, so the acceleration and deceleration times are the same, but the time of the constant velocity phase is different because the rotation angle at each joint is not the same.Using the trapezoidal profile with the acceleration/deceleration phase helps the robot move smoothly and reduce vibration.In this paper, the robotic vision system is designed to sort objects according to their size and shape.The proposed system can be used in the industrial process to reduce the required time to achieve the task of the production line, leading to improve the performance of the production line.
The movements of two robotic arms are driven by stepper motors and are controlled by the Arduino board.An approximation with less computational complexity is used to approximate the trapezoidal velocity profile that can be implemented on the Arduino controller.This helps the robot can move smoothly and reduce vibration.The Raspberry Pi computer is used as a master controller to control and communicate between two robots.The master is used for image processing, kinematic calculations, and communication.The combination of the two controller boards results in high performance and a low development cost.
The vision system is a key component in the sorting process.The accuracy and performance of the vision system directly affect the performance and speed of the sorting process.A simple and efficient image processing algorithm is been proposed that can be implemented in real-time, helping to shorten the time of the sorting process.The proposed method uses a series of algorithms including contour finding, border extraction, centroid algorithm, and shape threshold to recognize objects and eliminate noise.The proposed algorithm is tested on a database consisting of 100 images that be divided into five groups.The accuracy of the algorithm can reach more than 85%.There are a few cases of failure due to lighting conditions or some objects being very close to each other.The 3D coordinates of objects are estimated just by solving a linear equation system.The experiment results show that the estimation method based on vision has an average error of 3.47 mm.Using a simple algorithm minimizes the cycle time.The system needs a total time of 2.2 to 2.4s to achieve the sorting of one object.This is the time it takes for the robot to move.
In future works, the robot arm of the system will be updated.The stepper motors are replaced by servo motors for higher movement speed.The number of degrees of freedom of the robot is also increased to be able to perform more complex tasks. [

Figure 1 .Figure 2 Figure 3
Figure 1.Basic components of the system

Figure 3 .
Figure 3.The robot arm schematic both sides in each equation and add them together.After rearranging the terms, we get an equation in θ 3 -θ 2 :

Figure 9 .
Figure 9. Mechanical design on the Autodesk Inventor software

Figure 16 .
Figure 16.The rotation angles of the robot's joints when applying the trapezoidal velocity profiles 7. CONCLUSIONS

Table 1 . Specifications of the robotic arm.
employed to calibrate the parameters of the camera.The calibration uses the Camera Calibration Toolbox for Matlab® [25] Based on a total of 16 images of a planar chessboard (see Figure11).The intrinsic and extrinsic parameters after calibration are as follows: