papertechcse: Image Processing and Pattern Matching

B.OBULIRAJ

B.E CSE

Image Processing and Pattern Matching

Abstract

The growth of the Electronic Media, Process Automation and especially the outstanding growth of attention to national and personal security in the past few years have all contributed to the growing need of being able to automatically detect features and occurrences in pictures and video streams on a massive scale, without the need for human eye intervention and in real time. To date, all technologies available for such automated processing have come short of being able to supply a solution that is both technically viable and cost-effective.

This white paper details the basic ideas behind a novel, patent-pending technology called Image Processing over IP networks (IPoIP™). As its name implies, IPoIP provides a solution for automatically extracting useful data from a large number of simultaneous image (video or still) inputs connected to an IP network, but unlike other existing methods, does so at reduced costs without compromising reliability. The document will also outline the existing image-processing architectures and compare them to IPoIP. Ending this document will be a short chapter detailing several possible implementations of IPoIP in existing applications.

Introduction

A tremendous amount of research effort has been put into the ability to extract meaningful data out of captured images (both video and still) in the past years. As a result, a large number of proven algorithms exist both for real-time and offline applications, algorithms that are implemented on platforms ranging from pure software to pure hardware. These platforms, however, are generally designed to deal with a relatively small number of simultaneous image inputs (in most cases actually no more than one). They are designed in one of two main architectures: Local Processing and Server Processing.

Local Processing Architecture

This is by far the most commonly available system architecture for image processing. The main idea behind it is that all the processing is done at the camera location by a processing unit, and the results are then transmitted through a network connection to the monitoring area. The processing unit is usually PC based for the more complex solutions but the recent growing trend is to move the processing to standalone boxes based on a DSP or even an ASIC. It performs the entire image-processing task and outputs a suitable message to the network when an event is detected. Also residing at the location of the camera is a video encoder that is used for remotely viewing the video through the IP network. It can be configured to transmit the video at varying qualities depending on the available bandwidth. The video is transmitted using standard video compression techniques such as MJPEG, MPEG-4 and others. When cost is less of an issue, this architecture provides an adequate solution for running a single type of algorithm per camera. However, when the number of cameras increases and a more robust solution is needed (which is in many times the case), this solution falls short due to the following reasons:

• Each camera requires its own dedicated processing resources, causing the system cost to scale linearly with the number of cameras needed. No cost reduction is possible when dealing with a large-scale system.

• Each additional type of algorithm requires additional processing resources and integration between various algorithms is costly.

• In case of cameras that are distributed outdoors, PC based products provide an inadequate solution due to space limitations and their inability to withstand harsh environmental conditions.

• DSP based solutions require a much higher development effort because of limited resources and inferior development tools.

Server Processing Architecture The second type of system architecture (although far less common) is the “Server Processing” architecture. All of the image processing tasks are put on one single powerful server that serves many cameras. From a hardware point of view, this solution is more cost effective and is suitable for large-scale deployments. This architecture is made possible due to the fact that there are only a small percentage of “interesting” occurrences in each camera, requiring only a small amount of actual processing power and allowing for one server to deal with many cameras. Where this architecture comes short is on the network side – it has extraordinary bandwidth requirements. Because all of the image-processing functions are performed at the server, it needs to receive very high quality images in order to provide accurate results. This creates a need for significant network resources. When the application runs on a LAN with a relatively small number of cameras this may be possible, but for distributed applications with large numbers of cameras the solution becomes impractical because of the costly network infrastructure required. This also

leads to the fact that this type of architecture is usually used in applications where the algorithm works on a single frame at a time and not on a full video stream.

Requirements For a Viable Solution (Both Technical and Cost)

Having understood the limitations of the existing image processing architectures, let us now look at the requirements for a cost-effective and technically viable solution. Such a system must have the following characteristics:

• Scalability and mass-scale abilities – the system must be able to handle deployments ranging from a few dozen cameras up to thousands of cameras simultaneously.

• Scalability from a cost perspective – no matter what the scale of the deployment is – the system has to provide a cost-effective solution.

• The cameras should be able to be installed in geographically remote locations (under the assumption that there is an IP network connection to these locations).

• It must be possible to view each camera remotely from a monitoring station connected to the network.

• One or more image processing algorithms needs to be applied to each camera at any given moment. The outputs of these algorithms need to be collected in a central database and should also be viewable on the monitoring station.

• It should be possible to easily add new algorithms or customize existing ones without requiring massive upgrades to the system.

• There's a need to detect both single-camera events and multi-camera events. Multi camera events fuse the information from several sensors to create a higher level event.

• In Rural areas (tracks, pipelines, borders) where there's no infrastructure, power requirements and bandwidths (especially if using wireless) are very important. For these types of installations where power consumption is critical, installing PC's is not an option.

IPoIP Architecture

The IPoIP architecture was designed to answer the needs defined above with the following key goals in mind:

• Providing a cost effective solution for image processing applications over a large number of cameras without sacrificing detection probability or increasing False Alarm Rate (FAR).

• Enabling the application of any algorithm to any camera even if it is in a geographically remote location with limited supporting facilities.

• Providing the ability to apply a wide range of algorithms simultaneously to any camera without limiting the user to only a single application at a time.

The uniqueness of IPoIP is a distributed image processing architecture. Instead of performing the image-processing task either at the camera or in the monitoring area using one of the two aforementioned architectures, the algorithms are performed in both locations. They are segmented into 2 parts and divided between the video encoder hardware and the central image processing server. In this way IPoIP is able to retain the strengths of both the “Local” and “Server” architectures, while avoiding their limitations.

The idea behind this division is based on the fact that a processing unit already exists near each camera inside the video encoder (used to compress the video). This existing processing unit is a low-cost fixed-point processor and is highly suitable for performing several operations (as described below) that allow the sending of only a small amount of information to the image processing server for the main analysis. In this way, the system utilizes both the high resolution of the original video and the computing strength and flexibility of the central server, without the need for a costly network.

Feature Extraction Near the Camera

The initial part of the processing, which is done by the video encoder is called Universal Feature Extraction (UFE). This process is the part of the algorithm that works at the pixel level and extracts condensed information (or “features”) from the image pixels. This process works on the incoming images when they are at their highest quality and no data has been lost due to image compression. When a suitable feature is located it is sent to the central server for further analysis over the IP network. Since the feature data is very compact, it requires a negligible amount of network bandwidth (only around 20 Kbps for each camera). There are many types of features that can be identified in this manner, including but not limited to:

• Segmentation of foreground and background

• Motion vectors – generated by tracking areas of the image between successive frames.

• Histograms

• Specific color value range in a specified space (RGB, YUV, HSV).

• Edge information

• Identifying problems with the input video image such as image saturation, overall image noise and more.

Additionally, upon request from the server, the video encoder can send the actual pixel data for a certain portion of the image. For example, when performing automatic license plate recognition, the video encoder can send only the pixels of the license plate to the server, thus eliminating the need for more bandwidth as is the case when sending the whole picture. The common attributes to all these features is that they can be very efficiently implemented on fixed point DSP processors on the one hand and provide excellent building blocks for a wide variety of algorithms on the other hand (hence the name Universal Feature Extractor).

Feature Analysis At the Central Server

The main part of the processing is performed by the IPoIP server. The server is able to dynamically request specific features from each camera, according to the requirements of the specific algorithms that are currently being applied.

The server analyzes the feature data that is collected from each camera, and dynamically allocates computational resources as needed. In this way the server is able to utilize large-scale system statistics to perform very complex tasks when needed, without requiring a huge and expensive network for support.

The part of each algorithm that runs on the server performs the following main tasks:

1. Request specific features from the remote UFE.

2. Analyze the incoming features over time and extract meaningful “objects” from the scene.

3. Track all moving objects in the scene in terms of size, position and speed and calibrate all of This data into real word coordinates. The calibration process transforms the 2 dimensional data Received from the sensors into 3 dimensional data using various calibration techniques. Many Such techniques can be implemented in accordance with the specific scene being analyzed.

4. Classify these objects into one of several major classes such as vehicles, people, animals and Static objects. The classification process can be done using various parameters such as size, Speed and shape (pattern recognition).

5. Obtain additional information regarding objects of interest such as color, or sub classification (Type of vehicle, etc.)

6. Optionally extract unique identifying features for an object, such as license plate recognition or facial recognition.

7. Decide based on all the gathered information and on the active detection rules whether or not an event needs to be generated and the system operator informed.

8. Receive and analyze information from any other algorithm running on the server at the same time. This very powerful capability enables easy implementation of tasks such as inter-camera tracking. Using this ability a specific c moving object (a person or vehicle) can be accurately tracked as it moves from the field of view of one camera to the next with the system operator always viewing the correct image. This ability also enables creating sequences of rules where a rule on one camera only becomes activated (or deactivated) when a rule on another camera detects an event.

It is important to note that the algorithms at the server are constantly gathering information regarding the scene even though most of the time no events are being generated. This information can be stored as meta-data along with the video recording and later enable very fast and efficient searches on large amounts of recorded video content.

The Combined End-Product

Utilizing the methods described above, IPoIP is able to provide algorithm complexity level and low costs that are unrivaled by any other existing method today as can be seen in the following comparison table:

Applications In the Physical Security Market VMDetector

The IPoIP platform is ideally suited for applications needing multiple simultaneous image input and processing. The fastest growing market today for such large scale image processing is the Physical Security market. Standard security measures today include the rapid deployment of hundreds of thousands of cameras in streets, airports, schools, banks, offices and residences. These cameras are currently being used mainly for enabling the surveillance of a remote location by a human operator or for recording the occurrences at a certain location for use at a later time should the need arise. The introduction of digital video networking and other new technologies is now enabling the video surveillance industry to move to new directions that significantly enhance the functionality of such systems. As a result, video surveillance is rapidly penetrating into organizations needing security monitoring on a very large scale and in widely dispersed areas – such as railway operators, electricity and energy distributors, the Border and Coast Guards and many more. Such organizations encounter new problems of operating and handling a huge amount of cameras, While having to provide for extensive bandwidth requirements. This is where the use of automatic video-based event detection comes into play. Solutions are currently available for automatic Video Motion Detection (VMD), License Plate Recognition (LPR), Facial Recognition (FR), Behavior Recognition (BR), traffic violation detection and other image processing applications. The output of these detection systems may be used for triggering an alarm and/or initiating a video recording. This can reduce network bandwidth requirements (in situations where constant viewing and recording is not required) and allow allocation of human detection only to those cameras that contain a special event.

All the current implementations of these algorithms suffer from the inherent problems of existing system architectures as described above, and thus are very costly and unable to penetrate them market on a large scale. IPoIP provides the ideal platform for a cost-effective high performance and constantly evolving physical security system.

Sample Application - Railway System Protection

In order to demonstrate the practical use and benefits of the IPoIP technology, following is a description of a typical application – Railway System Protection. This example shares similar requirements with other applications such as borderline security, pipeline protection and more:

• Poor infrastructure – The power and communication infrastructure along the tracks is not guarantied. A low-power and low-bandwidth solution is mandatory (e.g. transmitting hundreds or thousands of cameras are not practical). A wireless / solar-cell powered solution is desired.

• Mostly outdoor environment – The system should be immune to typical outdoor environment phenomena such as rain, snow, clouds, headlights, animals, insects, pole vibration etc.

• Distributed locations – Railway facilities (tracks, stations, bridges, tunnels, service depots etc.) are distributed over a large geographic area, which forces using an IP network based system.

• Large-scale – A typical railway system would use thousands of cameras to protect the tracks and all facilities. The (Nuisance Alarm Ratio/False Alarm Rate) NAR/FAR per channel figures should be extremely low so that the accumulative system will can effectively be monitored by a small number of operators.

• Critical system – The system’s availability should be close to 100%. No single-point-of-failure should exist. It is desired that the network will handle local failures such as cable cuts.

• Variety of event types – The video intelligence system should detect intruders, suspected objects, safety hazards, suspected license plate numbers and other standard and user-specific event types. This can be achieved using multiple high level algorithms, including using several algorithms simultaneously for a single camera.

• Low cost of ownership – As the protected area is very large, rural and distributed, field visits are very expensive. Therefore, a minimum amount of equipment in the field is vital for low installation and maintenance costs.

Looking at the above list, it is clear that the classic concept of local processing – either field based or center based – fails to comply with most requirements. Field based solutions require lots of computers in the field, resulting high power requirements and cost of ownership. Server based solutions require transmission of all the video sources at high quality all the time to the center, resulting in very high bandwidth requirements.

Using IPoIP technology, only low-power video encoders with embedded feature extraction capability are required in the field. Furthermore, most of time there’s no need to transmit video but only low bandwidth feature stream data, which is a dramatic saving in network bandwidth requirements without compromising on performance.

Poles are installed along the tracks. Each pole is carrying a FLIR (thermal) camera, video encoder, IP network node and power circuitry. The FLIR camera can detect persons reliably up to few hundred meters at all weather and illumination conditions, thus preventing the need for artificial illumination and reducing FAR/NAR. The camera consumes 2-5W. The video encoder / feature extractor unit is a low power module that uses some 10-20Kbps of feature data in average and transmits video at higher bandwidth (0.5-2Mbps) only when an event is detected or upon an operator’s request.

The encoder consumes 3-8W. The IP network can be either a wired (copper or fiber) or wireless solution. For a wired network, fiber is recommended as it is not limited by distance and immune to EMI/RFI. If cabling if not possible or is too expensive, a wireless solution may be used. A hybrid WI-FI and satellite based network is recommended such that the inter-pole communication is WI-FI based and the access points use satellite link. An antenna should be installed on the top of the pole. This solution does not require any infrastructure and consumes about 10W per pole / 40W per access point. Power may be supplied either by power lines or using a solar cell and battery module. If cabling is used, it makes sense to use power lines. If a wireless network is used, the power should be supplied by solar cells.

On top of the FLIR cameras used for intruder detection, a PTZ color camera is installed every 2-4 Km for event monitoring and management. Two algorithms are used to protect the railroad. A Video Motion Detection (VMD) algorithm is used to detect persons and vehicles approaching the protected area. A Non-Motion Detection (NMD) algorithm is used to detect any static changes in the scene such as objects left on tracks (bomb, fallen tree, stuck car) or damaged tracks (missing parts). These two algorithms are used simultaneously. The server is located at the backend and is based on a cluster of two or more computers designed as required for critical systems, The server computers may even be geographically distributed over few locations to increase robustness. The system may be operated from any location on the network. This enables dividing large networks to various users / departments.

Bibliography:

Sunday, 9 February 2014

Image Processing and Pattern Matching

No comments:

Post a Comment

Blog's

Total Pageviews

obuli's blogs

Contact Form