The analysis of dynamic or time-varying data has emerged as an issue of great interest taking increasingly an important place in scientific community, especially in automation, pattern recognition and machine learning. There exists a broad range of important applications such as video analysis, motion identification, segmentation of human motion and airplane tracking, among others. Spectral matrix analysis is one of the approaches to address this issue. Spectral techniques, mainly those based on kernels, have proved to be a suitable tool in several aspects of interest in pattern recognition and machine learning even when data are time-varying, such as the estimation of the number of clusters, clustering and classification. Most of spectral clustering approaches have been designed for analyzing static data, discarding the temporal information, i.e. the evolutionary behavior along time. Some works have been developed to deal with the time varying effect. Nonetheless, an approach able to accurately track and cluster time-varying data in real time applications remains an open issue. This thesis describes the design of a kernel-based dynamic spectral clustering using a primaldual approach so as to carry out the grouping task involving the dynamic information, that is to say, the changes of data frames along time. To this end, a dynamic kernel framework aimed to extend a clustering primal formulation to dynamic data analysis is introduced. Such framework is founded on a multiple kernel learning (MKL) approach. Proposed clustering approach, named dynamic kernel spectral clustering (DKSC) uses a linear combination of kernels matrices as a MKL model. Kernel matrices are computed from an input frame sequence represented by data matrices. Then, a cumulative kernel is obtained, being the model coefficients or weighting factors obtained by ranking each sample contained in the frame. Such ranking corresponds to a novel tracking approach that takes advantages of the spectral decomposition of a generalized kernel matrix. Finally, to get the resultant cluster assignments, data are clustered using the cumulative kernel matrix. Experiments are done over real databases (human motion and moon covered by clouds) as well as artificial data (moving-Gaussian clouds). As a main result, proposed spectral clustering method for dynamic data proved to be able for grouping underlying events and movements and detecting hidden objects as well. The proposed approach may represent a contribution to the pattern recognition field, mainly, for solving problems involving dynamic information aimed to either tracking or clustering of data.
This work presents a comparative study of different partitional and spectral clustering techniques to cluster heartbeats patterns of long-term ECG signals. Due to the nature of signals and since, in many cases, it is not feasible labeling thereof, clustering is preferred for analysis. The use of a generic model of partitional clustering and the appropriate estimation of initialization parameters via spectral techniques represent some of the most important contributions of this research. The experiments are done with a standard arrhythmia database of MIT (Massachusetts Institute of Technology) and the feature extraction is carried out using techniques recommended by literature. Another important contribution is the design of a sequential analysis method which reduces the computational cost and improves clustering performance compared to traditional analysis that is, analyzing the whole data set in one iteration. Additionally, it suggests a complete system for unsupervised analysis of ECG signals, including feature extraction, feature selection, initialization and clustering stages. Also, some appropriate performance measures based on groups analysis were designed, which relate the clustering performance with the number of resultants groups and computational cost. This study is done taking into account the AAMI standard (Association for the Advance of Medical Instrumentation).
Stochastic neighbour embedding (SNE) and its variants are methods of nonlinear dimensionality reduction that involve soft Gaussian neighbourhoods to measure similarities for all pairs of data. In order to build a suitable embedding, these methods try to reproduce in a low-dimensional space the neighbourhoods that are observed in the high-dimensional data space. Previous works have investigated the immunity of such similarities to norm concentration, as well as enhanced cost functions, like sums of Jensen–Shannon divergences. This paper proposes an additional refinement, namely multi-scale similarities, which are averages of soft Gaussian neighbourhoods with exponentially growing bandwidths. Such multi-scale similarities can replace the regular, single-scale neighbourhoods in SNE-like methods. Their objective is then to maximise the embedding quality on all scales, with the best preservation of both local and global neighbourhoods, and also to exempt the user from having to fix a scale arbitrarily. Experiments with several data sets show that the proposed multi-scale approach captures better the structure of data and improves significantly the quality of dimensionality reduction.
A comprehensive methodology for classification-oriented non-supervised analysis through the use of spectral techniques is presented here. This methodology includes a non-supervised clustering stage developed under the multiway criterion of normalized partitions. This technique is preferred because it does not require an additional clustering algorithm. Additionally, it results in a suitable partition, since it considers the information obtained from a solution of one’s own. Besides, adequate affinity measurements are applied, and the number of clusters is automatically estimated in order to reduce processing time and improve algorithm convergence. Experimental results are obtained on an image database. The quality of the cluster is measured by means of segmentation results. Also, a non-supervised performance measurement was introduced.
Sleep is a growing area of research interest in medicine and neuroscience. Actually, one major concern is to find a correlation between several p hysiologic variables and sleep stages. There is a scientific agreement on the characteristics of the five stages of human sleep, based on EEG analysis. Nevertheless, manual stage classification is still the most widely used approach. This work proposes a new automatic sleep classification method based on unsupervised feature classification algorithms recently developed, and on EEG entropy measures. This scheme extracts entropy metrics from EEG records to obtain a feature vector. Then, these features are optimized in terms of relevance using the Q-α algorithm. Finally, the resulting set of features is entered into a clustering procedure to obtain a final segmentation of the sleep stages. The proposed method reached up to an average of 80% correctly classified stages for each patient separately while keeping the computational cost low.
This paper presents a new method to estimate neural activity from electroencephalographic signals using a weighted time series analysis. The method considers a physiologically based linear model that takes both spatial and temporal dynamics into account and a weighting stage to modify the assumptions of the model from observations. The calculated weighting matrix is included in the cost function used to solve the dynamic inverse problem, and therefore in the Kalman filter formulation. In this way, a weighted Kalman filtering approach is proposed including a preponderance matrix. The filter’s performance (in terms of localization error) is analyzed for several SNRs. The optimal performance is achieved using the linear model with a weighting
Nowadays, great amount of data is being created by several sources from academic, scientific, business and industrial activities. Such data intrinsically contains meaningful information allowing for developing techniques, and have scientific validity to explore the information thereof. In this connection, the aim o*f artificial intelligence (AI) is getting new knowledge to make decisions properly. AI has taken an important place in scientific and technology development communities, and recently develops computer-based processing devices modern machines. Under the premise, the premise that the feedback provided by human reasoning -which is holistic, flexible and parallel- may enhance the data analysis, the need for the integration of natural and artificial intelligence has emerged. Such an integration makes the process of knowledge discovery more effective, providing the ability to easily find hidden trends and patterns belonging to the database predictive model. As well, allowing for new observations and considerations from beforehand known data by using both data analysis methods and knowledge and skills from human reasoning. In this work, we review main basics and recent works on artificial and natural intelligence integration in order to introduce users and researchers on this emergent field. As well, key aspects to conceptually compare them are provided.
Se presenta un método para clasificar arritmias ventriculares de tipo bloqueo de rama izquierda (L) y derecha (R) con respecto a latidos normales (N) de la base de datos de arritmias de la MIT-BIH utilizando clasificación no supervisada debido principalmente a la variabilidad morfológica entre registros. Se desarrolla una etapa de extracción de características basada en la morfología del latido y una etapa de clustering que utiliza el algoritmo de búsqueda heurística k-means modificado en el sentido de su inicialización utilizando el criterio max-min. El sistema presenta resultados comparables con los reportados en la literatura.
Se presenta un método para la optimización de la DTW en aplicaciones de clustering basado en restricciones globales que se obtienen mediante un estudio de la naturaleza de los momentos estadísticos de las disimilitudes. El estudio se realiza extrayendo vectores representativos que corresponden a latidos de 16 tipos de arritmia de los diferentes registros de la base de datos MIT – BIH. El método presenta una diferencia promedio de las disimilitudes de la diagonal principal de 1.78*10-4 y reduce considerablemente el tiempo de procesamiento para el cálculo de la disimilitud. Este método resulta útil para procesos de clustering porque reduce el costo computacional sin alterar el valor de las disimilitudes.
En este trabajo se desarrolla una metodología para la reconstrucción y caracterización de los complejos QRS empleando el modelo paramétrico de Hermite. Los complejos son extraídos de la base de datos MIT - BIH. La reconstrucción se realiza empleando el valor óptimo del parámetro de escala de las bases de Hermite obtenido mediante la minimización de la disimilitud de la señal original y la reconstruida. Se emplea DTW como medida de disimilitud. Adicionalmente, se presenta un método para obtener la cantidad mínima de bases que generan una reconstrucción con alta confiabilidad basado en la comparación de los espectros de frecuencia en el rango de1−20Hz. La evaluación de la caracterización se realiza mediante el algoritmo de clustering K-means Max-Min.
The computer-assisted analysis of biomedical records has become an essential tool in clinical settings. However, current devices provide a growing amount of data that often exceeds the processing capacity of normal computers. As this amount of information rises, new demands for more efficient data extracting methods appear. This paper addresses the task of data mining in physiological records using a feature selection scheme. An unsupervised method based on relevance analysis is described. This scheme uses a least-squares optimization of the input feature matrix in a single iteration. The output of the algorithm is a feature weighting vector. The performance of the method was assessed using a heartbeat clustering test on real ECG records. The quantitative cluster validity measures yielded a correctly classified heartbeat rate of 98.69% (specificity), 85.88% (sensitivity) and 95.04% (general clustering performance), which is even higher than the performance achieved by other similar ECG clustering studies. The number of features was reduced on average from 100 to 18, and the temporal cost was a 43% lower than in previous ECG clustering schemes.
En este trabajo se desarrolla una metodología para la clasificación de arritmias ventriculares de tipo extrasístole ventricular y bloqueo de rama izquierda y derecha con respecto a latidos normales de la base de datos de arritmias MIT-BIH, empleando características morfológicas del complejo QRS y clasificación no supervisada de tipo particional. El sistema propuesto incluye etapas de preproceso, extracción de características y clasificación no supervisada. El preproceso consiste en centrar y normalizar todas las señales. La extracción de características se realiza utilizando la reconstrucción de la señal a través del modelo paramétrico de Hermite con parámetros óptimos y características morfológicas. En la etapa de clustering se aplican los algoritmos H-means y K-harmonics deducidos del clustering iterativo (genérico) basado en centroides. Además, se realiza una modificación del algoritmo en el sentido de su inicialización utilizando el criterio max-min para garantizar y mejorar la convergencia. El desempeño del sistema para registros de la base de datos MIT/BIH, en promedio es de Se= 97,53 %, Sp= 93,7 %, siendo comparable con resultados publicados en la literatura.
In this work, a nonsupervised algorithm for feature selection and a non-parametric density-based clustering algorithm are presented, whose density estimation is performed by Parzen's window approach; this algorithm solves the problem that individual components of the mixture should be Gaussian. The method is applied to a set of recordings from MIT/BIH's arrhythmia database with five groups of arrhythmias recommended by the AAMI. The heartbeats are characterized using prematurity indices, morphological and representation features, which are selected with the Q-a algorithm. The results are assessed by means supervised (Se, Sp, Sel) and nonsupervised indices for each arrhythmia. The proposed system presents comparable results than other unsupervised methods of literature.
En este trabajo se presenta una prueba piloto de una nueva técnica de caracterización de señales ECG empleando un modelo del pulso cardíaco y técnicas de análisis de sistemas no lineales. Las señales empleadas en este estudio son extraídas de la base de datos de arritmias del MIT (MIT-BIH). La caracterización se hace a partir de la sintonización secuencial de los parámetros del modelo por cada latido. Para esto, se desarrolla una etapa de preproceso para eliminar las ondas P y T de cada latido, de forma que la señal sea, morfológicamente, comparable con la respuesta del modelo. La comparación se realiza estimando el grado de disimilitud de las señales a través del algoritmo DTW. Adicionalmente, se realiza un análisis de bifurcaciones del modelo para estudiar la correlación de la respuesta del modelo con la presencia de alguna patología o falla en el sistema de adquisición de la señal.
This work describes a method to classify cardiac arrhythmias in Holter records using unsupervised classification. Unsupervised analysis is preferred in this work because detecting a specific heartbeat in Holter records requires analyzing every heartbeat taking into account several factors such as: variability, signal length, EMG noise, artifacts and different dynamic behavior and morphology. ECG signals are extracted from MIT-BIH arrhythmia database. The proposed method includes pre-processing, feature selection and unsupervised classification. Features are obtained from heartbeat morphology, time related and representation features. For feature selection, we apply the unsupervised Q−α algorithm obtained from spectral and graph-based analysis. The classification stage is based on partitional clustering using the general iterative model. In order to improve the algorithm convergence, clustering incorporates a center initialization based on J-means criterion. The method shows good average performance, namely sensitivity 98.9%, specificity 99.8% and clustering performance 99.6%.
Se propone una metodología para el análisis no supervisado de patrones descriptivos de latidos en señales ECG de registros Holter, con el fin de clasificar arritmias y mostrarla aplicabilidad y versatilidad de las técnicas de análisis no supervisado. El sistema propuesto incluye etapas de caracterización, selección de características, criterios de inicialización, estimación del número de grupos y clustering, desarrolladas con técnicas de tipo particional y espectral. Los experimentos se realizan con la bases de de datos MIT/BIH.
A method that improves the feature selection stage for non-supervised analysis of Holter ECG signals is presented. The method corresponds to WPCA approach developed mainly in two stages. First, the weighting of the feature set through a weight vector based on M-inner product as distance measure and a quadratic optimization function. The second one is the linear projection of weighted data using principal components. In the clustering stage, some procedures are considered: estimation of the number of groups, initialization of centroids and grouping by means a soft clustering algorithm. In order to decrease the procedure computational cost, segment analysis, grouping contiguous segments and establishing union and exclusion criteria per each cluster, is carried out. This work is focused to classify cardiac arrhythmias into 5 groups, according to the standard of the AAMI (ANSI/AAMI EC57:1998/ 2003). To validate the method, some recordings from MIT/BIH arrhythmia database are used. By employing the labels of each recording, the performance is assessed with supervised measures (Se = 90.1%, Sp = 98.9% y C p = 97.4%),enhancing other works in the literature that do not take into account all heartbeat types.
Most applications in the analysis of thermal images can detect changes in patterns. Thus, thermal patterns provide sufficient information of the structure for a particular device, giving the opportunity to alert and perform some preventive action in the process.
This work presents a comparative study of different affinity measures applied in spectral clustering algorithms commonly used and recommended by literature. Spectral algorithms for clustering are known to be useful in different areas of science and engineering such as image segmentation, load balance for intensive computing, circuits design, among others. In particular they have been proved to be effective in classification tasks where the classes are not linearly separable. Typically, spectral clustering involves calculating an affinity matrix from the data to be processed, which usually can be done by different standard approaches or adjusted to the requirements for the specific algorithm or classification task. Experiments results are carried out using an image database and clustering performance is measured by means supervised and unsupervised indices based on cluster coherence and objective function value. Results show a general idea of the effects of the affinity matrix selection in grouping process in terms of convergence time and accuracy.
This work presents a comparative analysis between different weighting factors for PCA, obtained from a generalized distance measure. By employing such generalized distance, an optimization problem is established whose solution provides relevance values for each feature. Relevance values are used both for weighting data in a WPCA scheme to carry out a feature extraction and also for selecting relevant features. Their performance and suitability are assessed in terms of dimensionality reduction and classification performance achieved by applying both supervised and unsupervised classification techniques over real datasets.
Clustering techniques usually requires manually set parameters so the classification task may be correctly carried out, one of the most common being the number of groups or clusters in which data should be separated, yet this relies in a prior knowledge of the data nature. In this work a comparison among different approaches for finding the number of groups is shown, such as singular value decomposition (SVD), analysis of the multiplicity of the greatest eigenvalues from the affinity matrix, and the percentage of the cumulative sum of the singular values of the affinity matrix. The spectral nature of the estimation process as well as the different datasets used, infers that the results rely only in the internal information of data matrixes. Results exhibits both limitations and advantages for each method, weather directly related with the nature of the data, or limited by the process structure and definition. Nonetheless these guidelines will be helpful for deciding which estimation technique best applies for clustering data regardless its origin.
This paper presents a new method to estimate neural activity from electroencephalographic signals using a weighted time series analysis. The method considers a physiologically based linear model that take both spatial and temporal dynamics into account and a weighting stage to modify the assumptions of the model from observations. The calculated weighting matrix is included in the cost function used to solve the dynamic inverse problem, and therefore in the Kalman filter formulation. In this way, a weighted Kalman filtering approach is proposed including a preponderance matrix. The filter’s performance (in terms of localization error) is analyzed for several SNRs. The optimal performance is achieved using the linear model with a weighting matrix computed by an inner product method.
Event-related potentials (ERPs) are electrical signals from brain generated as a response to an external sensorial stimulus. This kind of signals are widely used to diagnose neurological disorders, such as Attention-deficit hyperactivity disorder (ADHD). In this paper, a novel methodology for ADHD discrimination is proposed, which consist of obtaining a new data representation by means of a re-characterization of initial feature space. Such re-characterization is done through the distances between data and centroids obtained from k-means algorithm. This methodology also includes pre-clustering and liner projection stages. In addition, this paper explores the use of morphological and spectral features as descriptive patterns of ERP signal in order to discriminate between normal subjects and ADHD patients. Experimental results show that the morphological features, in contrast with the remaining features considered in this study, are those that more contribute to classification performance, reaching 86% for the original feature set.
Cardiac arrhythmias are important indicators of heart diseases, they refer to electrical conduction problems and therefore their diagnosis is of high clinical interest. However, timely detection is difficult due to factors such as computational cost, large amount of heartbeats per record, morphology variability, infrequency and irregularity of pathological heartbeats. In this work, wavelet transform computed through wavelet packets is applied over electrocardiographic (ECG) signals as a method to characterize and identify normal ECG signals and some arrhythmias such as atrial fibrilation (AF) and life threaded arrhythmias, drawn from MIT databases.
En este artículo se presenta algunos de los alcances logrados de la tesis de pregrado titulada “Analysis of the effects of linear and non-linear distortions on QPSK modulated signals for optical channels” desarrollado por los dos primeros autores. La pertinencia de este trabajo radica en que, actualmente, el procesamiento digital de señales ha tomado un importante lugar en las comunicaciones, en particular, en comunicaciones ópticas. Los sistemas actuales de transmisión son soportados por dispositivos con DSP para tareas de filtración y ecualización. En el caso de la filtración, la primer paso es interpretar la distorsión y, en el mejor de los casos, obtener un modelo. En este trabajo se presenta un breve estado del arte de las distorsiones lineales y no lineales en canales de fibra óptica desarrolladas en los últimos trece años. También se presenta un marco teórico concreto y útil acerca de distorsiones y formatos de modulación digital orientado a las comunicaciones ópticas. En tanto a las distorsiones se da una definición y se estudia el efecto sobre la constelación de representación. Para los formatos de modulación digital se presenta un diagrama de bloques y ecuaciones generalizadas. También, se introduce un modelo de distorsión lineal para el formato QPSK.
This work presents an analysis of electric current signal in compact fluorescent lamps (CFLs). Electrical signals are measured in two circuits, one that corresponds to the CFL shunt connected with the AC source and another one that incorporates a control system into the CFL. Such control system works as a power factor correction (PFC) and is designed by employing a boost converter and a current controller. Signals are analyzed in terms of frequency-based representations oriented to estimate the power spectral density (PSD). In this study, three approaches are employed: Fourier transform, periodogram and a window-based PSD. The goal of this work is to show that more complex PSD estimation methods can provide useful information for studying the quality energy in electric power systems. Proposed spectral analysis represents an alternative to traditional approaches.
Optical Monitoring Systems need to be less expensive if we are to create an Optical Packet Switched Network. The application presented in this paper uses low cost sampling circuits and simple Digital Signal Processing in order to acquire good approximations of the Optical Signal to Noise Ratio by using the exponential relation found between the Variance of one of the peaks in the AAH and the OSNR. The method works especially well for PSK and QPSK modulation schemes, and presents a good representation for DPSK and DQPSK systems. The results here presented were heuristically obtained, though some basic relation between the features used are devisable, and could lead to future analytical approaches. Index Terms—Asynchronous Amplitude Histogram (AAH),Optical Signal to Noise Ratio (OSNR), Gaussian fitting, Optical Monitoring System (OMS).
We present a new method for relevance analysis based on spectral information, which is done from a graph theory point of view. This method is carried out by using Gaussian Kernels instead of conventional quadratic forms and then avoiding the need of a linear combination-based representation. For this end, it is implemented an extended approach for relevance analysis using alternative Kernels, in this case, exponential ones. For assessing the proposed method performance, it is applied a clustering algorithm commonly used and recommended by literature: normalized cuts based clustering. Experimental results are obtained from the processing of well-known image and toy data bases. Results are comparable with those reported in the literature.
Spectral clustering has represented a good alternative in digital signal processing and pattern recognition; however decisions concerning the affinity functions among data are still an issue. In this work its presented an extended version of a traditional multiclass spectral clustering method which employs prior information about the classified data into the affinity matrixes aiming to maintain the background relation that might be lost in the traditional manner, that is using a scaled exponential affinity matrix constrained by weighting the data according to some prior knowledge and via k-way normalized cuts clustering, results in a semi-supervised methodology of traditional spectral clustering. Test was performed over toy data classification and image segmentation and evaluated with and unsupervised performance measures (group coherence, fisher criteria and silhouette).
In this paper an automatic image segmentation methodology based on Multiple Kernel Learning (MKL) is proposed. In this regard, we compute some image features for each input pixel, and then combine such features by means of a MKL framework. We automatically fix the weights of the MKL approach based on a relevance analysis over the original input feature space. Moreover, an unsupervised image segmentation measure is used as a tool to establish the employed kernel free parameter. A Kernel Kmeans algorithm is used as spectral clustering method to segment a given image. Experiments are carried out aiming to test the efficiency of the incorporation of weighted feature information into clustering procedure, and to compare the performance against state of the art algorithms, using a supervised image segmentation measure. Attained results show that our approach is able to compute a meaningful segmentations, demonstrating its capability to support further vision computer applications.
In this paper an automatic image segmentation methodology based on Multiple Kernel Learning (MKL) is proposed. In this regard, we compute some image features for each input pixel, and then combine such features by means of a MKL framework. We automatically fix the weights of the MKL approach based on a relevance analysis over the original input feature space. Moreover, an unsupervised image segmentation measure is used as a tool to establish the employed kernel free parameter. A Kernel K-means algorithm is used as spectral clustering method to segment a given image. Experiments are carried out aiming to test the efficiency of the incorporation of weighted feature information into clustering procedure, and to compare the performance against state of the art algorithms, using a supervised image segmentation measure. Attained results show that our approach is able to compute a meaningful segmentations, demonstrating its capability to support further vision computer applications.
An extended digital estimation approach of optical signal to noise ratio (OSNR) based on statistical analysis of asynchronous amplitude histograms (AAH) of the received optical signal is presented and numerically investigated. Accurate OSNR estimation on highly noisy optical transmission link is achieved. Furthermore, the proposed OSNR estimation approach may be digitally adjusted to any Cartesian modulation format such as multilevel phase shift keying and quadrature amplitude modulated optical signals without degrading estimation accuracy. The OSNR estimation methodology is based on kernel density estimation with Gaussian kernels and non-linear least-squares regression (NLLS). Heuristic searches are no longer needed and the process becomes more reliable and robust. Reported results show accurate OSNR estimation with less than 15% error estimation on the simulated OSNR value for different signal modulation formats, exhibiting a more confident estimation system, with comparable results among formats because of the statistical nature histogram instead of the regular counting bins histogram.
This paper is focused on testing the latency contribution as regards the quality of formed groups for discriminating between healthy and attention deficit hyperactivity disorder children. To this end, two different cases are considered: nonaligned original recordings and aligned signals according to P300 position. For latter case, a novel approach to conduct time location of P300 component is introduced, which is based on derivative of event-related potential signals. The used database holds event-related potentials registered in auditory and visual oddball paradigm. Several experiments are carried out testing both configurations of considered data matrix. For grouping input data matrices, the k-means clustering technique is employed. To assess the quality of formed clusters and the relevance for clustering of latency-based features, relative values of distances between centroids and data points are computed in order to apprise separability and compactness of estimated clusters. Experimental results show that time localization of P300 component is not a decisive feature in formation of compact and well-defined groups within a discrimination framework for two considered data classes under certain conditions.
In this work, we present an improved multi-class spectral clustering (MCSC) that represents an alternative to the standard k-way normalized clustering, avoiding the use of an iterative algorithm for tuning the orthogonal matrix rotation. The performance of proposed method is compared with the conventional MCSC and k-means in terms of different clustering quality indicators. Results are accomplished on commonly used toy data sets with hardly separable classes, as well as on an image segmentation database. In addition, as a clustering indicator, a novel unsupervised measure is introduced to quantify the performance of the proposed method. The proposed method spends lower processing time than conventional spectral clustering approaches.
This work presents an approach to quantify the quality of panelist's labeling by means of a soft-margin support vector machine formulation for a bi-class classifier, which is extended to multi-labeler analysis. This approach starts with a formulation of an objective function to determine a suitable hyperplane of decision for classification tasks. Then, this formulation is expressed in a soft-margin form by introducing some slack variables. Finally, we determine penalty factors for each panelist. To this end, a panelist's effect term is incorporated in the primal soft-margin problem. Such problem is solved by deriving a dual formulation as a quadratic programming problem. For experiments, the well-known Iris database is employed by simulating multiple artificial labels. The obtained penalty factors are compared with standard supervised measures calculated from confusion matrix. The results show that penalty factors are related to the nature of data, allowing to properly quantify the concordance among panelists.
We propose a first approach to quantify the panelist’s labeling generalizing a soft-margin support vector machine classifier to multi-labeler analysis. Our approach consists of formulating a quadratic optimization problem instead of using a heuristic search algorithm. We determine penalty factors for each panelist by incorporating a linear combination in the primal formulation. Solution is obtained on a dual formulation using quadratic programming. For experiments, the well-known Iris with multiple simulated artificial labels and a multi-label speech database are employed. Obtained penalty factors are compared with both standard supervised and non-supervised measurements. Promising results show that proposed method is able to assess the concordance among panelists considering the structure of data.
Processing of the long-term ECG Holter recordings for accurate arrhythmia detection is a problem that has been addressed in several approaches. However, there is not an outright method for heartbeat classification able to handle problems such as the large amount of data and highly unbalanced classes. This work introduces a heuristic-search-based clustering to discriminate among ventricular cardiac arrhythmias in Holter recordings. The proposed method is posed under the normalized cut criterion, which iteratively seeks for the nodes to be grouped into the same cluster. Searching procedure is carried out in accordance to the introduced maximum similarity value. Since our approach is unsupervised, a procedure for setting the initial algorithm parameters is proposed by fixing the initial nodes using a kernel density estimator. Results are obtained from MIT/BIH arrhythmia database providing heartbeat labelling. As a result, proposed heuristic-search-based clustering shows an adequate performance, even in the presence of strong unbalanced classes.
This paper introduces a novel spectral clustering approach based on kernels to analyze time-varying data. Our approach is developed within a multiple kernel learning framework, which, in this case is assumed as a linear combination model. To perform such linear combination, weighting factors are estimated by a ranking procedure yielding a vector calculated from the eigenvectors-derived-clustering-method. Particularly, the method named kernel spectral clustering is considered. Proposed method is compared with some conventional spectral clustering techniques, namely, kernel k-means and min-cuts. Standard k-means as well. The clustering performance is quantified by the normalized mutual information and adjusted random index measures. Experimental results prove that pro-posed approach is an useful tool for both tracking and clustering dynamic data, being able to manage applications for human motion analysis.
This work introduces a first approach to track moving-samples or frames matching each sample to a single meaningful value. This is done by combining the kernel spectral clustering with a feature relevance procedure that is extended to rank the frames in order to track the dynamic behavior along a frame sequence. We pose an optimization problem to determine the tracking vector, which is solved by the eigenvectors given by the clustering method. Unsupervised approaches are preferred since, for motion tracking applications, labeling is unavailable in practice. For experiments, two databases are considered: Motion Caption and an artificial three-moving Gaussian in which the mean changes per frame. Proposed clustering is compared with kernel K-means and Min-Cuts by using normalized mutual information and adjusted random index as metrics. Results are promising showing clearly that there exists a direct relationship between the proposed tracking vector and the dynamic behavior.
Clustering is of interest in cases when data are not labeled enough and a prior training stage is unfeasible. In particular, spectral clustering based on graph partitioning is of interest to solve problems with highly non-linearly separable classes. However, spectral methods, such as the well-known normalized cuts, involve the computation of eigenvectors that is a highly time-consuming task in case of large data. In this work, we propose an alternative to solve the normalized cuts problem for clustering, achieving same results as conventional spectral methods but spending less processing time. Our method consists of a heuristic search to find the best cluster binary indicator matrix, in such a way that each pair of nodes with greater similarity value are first grouped and the remaining nodes are clustered following a heuristic algorithm to search into the similarity-based representation space. The proposed method is tested over a public domain image data set. Results show that our method reaches comparable results with a lower computational cost.
This work presents an analysis of electric current signal in LEDs lamps. Electrical signals are measured in two circuits, one that corresponds to the commercial LEDs lamp connected to AC source and another one incorporating a control system into the LEDs lamp. Such control system works as a power factor correction (PFC) and is designed by using a boost converter and a current controller. Signals are analyzed in terms of frequency-based representations oriented to estimate the power spectral density (PSD). In this study, two approaches are used: Discrete Fourier transform and periodogram. The goal of this work is to show that more complex PSD estimation methods can provide useful information for studying the quality energy in electric power systems, which is comparable with that provided by traditional approaches. In particular, periodogram shows to be a suitable alternative exhibiting meaningful changes along its spectral power plotting when analyzing the circuit without applying PFC. As a result of this work, a set of LEDs lamps characteristics is introduced, including a novel periodicity factor.
In this paper we propose a kernel-spectral-clustering-based technique to catch the different regimes experienced by a time-varying system. Our method is based on a multiple kernel learning approach, which is a linear combination of kernels. The calculation of the linear combination coefficients is done by determining a ranking vector that quantifies the overall dynamical behavior of the data sequence analyzed over time. This vector can be calculated from the eigenvectors provided by the solution of the kernel spectral clustering problem. We apply the proposed technique to a trial from the Graphics Lab Motion Capture Database from Carnegie Mellon University, as well as to a synthetic database, namely three moving Gaussian clouds. For comparison purposes, some conventional spectral clustering techniques are also considered, namely, kernel k-means, min-cuts as well as standard k-means. The normalized mutual information and adjusted random index metrics are used to quantify the clustering performance. Results show the usefulness and applicability of proposed technique to track dynamic data, even being able to detect hidden objects.
This work introduces a generalized kernel perspective for spectral dimensionality reduction approaches. Firstly, an elegant matrix view of kernel principal component analysis (PCA) is described. We show the relationship between kernel PCA, and conventional PCA using a parametric distance. Secondly, we introduce a weighted kernel PCA framework followed from least-squares support vector machines (LS-SVM). This approach starts with a latent variable that allows to write a relaxed LS-SVM problem. Such a problem is addressed by a primal-dual formulation. As a result, we provide kernel alternatives to spectral methods for dimensionality reduction such as multidimensional scaling, locally linear embedding, and laplacian eigenmaps; as well as a versatile framework to explain weighted PCA approaches. Experimentally, we prove that the incorporation of a SVM model improves the performance of kernel PCA.
Los Objetos de Aprendizaje (OA), como unidades de contenido educativo y como recursos reutilizables permiten disponer de la información para ser utilizada en ambientes virtuales de aprendizaje y para acceder a ellos se debe contar con mecanismos de recuperación y selección permitiendo el apoyo al proceso de enseñanza-aprendizaje de un estudiante en particular. Los Sistemas de Recomendación (SR) son herramientas que apoyan al usuario a obtener información de acuerdo a sus necesidades y preferencias. Para el uso de SR en ambientes educativos, se debe contar con la caracterización de los estudiantes a través de perfiles de usuario con el fin de entregar OA adaptados. Pero no todas las características son relevantes en el proceso de recomendación, por ello en este trabajo se hace un primer acercamiento al análisis de las variables asociadas al estudiante con el fin de seleccionar aquellas que más puedan aportar al proceso de personalización.
LED lamps are widely used in household, nonetheless they are still non-linear. Therefore, LEDs need a power supply system to correcting their operation, which introduces nonlinearities into the electrical grid and distortions on wave-form. Then analysis and quantification of electrical signals is becoming a key issue. This work presents a spectral analysis of analysis of electric current signal in LEDs lamps which yields a novel collection of characteristics and measures to quantify the waveform quality. In particular, periodogram and Fourier transform are considered. For experiments, two circuits are considered: one that corresponds to the commercial LEDs lamp connected to AC source and another one incorporating a power factor corrector. Experimentally, the usefulness and applicability of proposed characteristics is proved.
Cardiac arrhythmia analysis on Holter recordings is an important issue in clinical settings, however such issue implicitly involves attending other problems related to the large amount of unlabeled data which means a high computational cost. In this work an unsupervised methodology based in a segment framework is presented, which consists of dividing the raw data into a balanced number of segments in order to identify fiducial points, characterize and cluster the heartbeats in each segment separately. The resulting clusters are merged or split according to an assumed criterion of homogeneity. This framework compensates the high computational cost employed in Holter analysis, being possible its implementation for further real time applications. The performance of the method is measure over the records from the MIT/BIH arrhythmia database and achieves high values of sensibility and specificity, taking advantage of database labels, for a broad kind of heartbeats types recommended by the AAMI.
Heartbeat characterization is an important issue in cardiac assistance diagnosis systems. In particular, wide sets of features are commonly used in long term Holter ECG signals. Then, if such a feature space does not represent properly the arrhythmias to be grouped, classification or clustering process may fail. In this work a suitable feature set for different heartbeat types is studied, involving morphology, representation and time-frequency features. To determine what kind of features generate better clusters, feature selection procedure is used and assessed by means clustering validity measures. Then the feature subset is shown to produce fine clustering that yields into high sensitivity and specificity values for a broad range of heartbeat types.
Stochastic neighbor embedding (SNE) is a method of dimensionality reduction that involves soft max similarities measured between all pairs of data points. To build a suitable embedding, SNE tries to repro-duce in a low-dimensional space the similarities that are observed in the high-dimensional data space. Previous work has investigated the immunity of such similarities to norm concentration, as well as enhanced cost functions. This paper proposes an additional refinement, in the form of multiscale similarities, namely weighted sum of soft max similarities with growing bandwidths. The objective is to maximize the embedding quality on all scales, with a better preservation of both local and global neighbor-hoods, and also to exempt the user from having to fix a scale arbitrarily. Experiments on several data sets show that this multiscale version of SNE, combined with an appropriate cost function (sum of Jensen-Shannon divergences), outperforms all previous variants of SNE.
Spectral clustering has taken an important place in the context of pattern recognition, being a good alternative to solve problems with non-linearly separable groups. Because of its unsupervised nature, clustering methods are often paramet-ric, requiring then some initial parameters. Thus, clustering performance is greatly dependent on the selection of those initial parameters. Furthermore, tuning such parameters is not an easy task when the initial data representation is not adequate. Here, we propose a new projection for input data to improve the cluster identification within a kernel spectral clustering framework. The proposed projection is done from a feature extraction formulation, in which a generalized distance involving the kernel matrix is used. Data projection shows to be useful for improving the performance of kernel spectral clustering.
In recent years, there has been an increasing interest in the design of pattern recognition systems able to deal with labels coming from multiple sources. To avoid bias during the learning process, in some applications it is strongly recommended to learn from a set of panelists or experts instead of only one. In particular, two aspects are of interest, namely: discriminating between confident and unconfident labelers, and determining the suitable ground truth. This work presents an extension of a previous work, which consists of a generalization of the two class case via a modified one-against-all approach. This approach uses modified classifiers able to learn from multi-labeler settings. This is done within a soft-margin support vector machine framework. Proposed method provides ranking values for panelist as well as an estimate of the ground truth
Dimensionality reduction is a key stage for both the design of a pattern recognition system or data visualization. Recently, there has been a increasing interest in those methods aimed at preserving the data topology. Among them, Laplacian eigenmaps (LE) and stochastic neighbour embedding (SNE) are the most representative. In this work, we present a brief comparative among very recent methods being alternatives to LE and SNE. Comparisons are done mainly on two aspects: algorithm implementation, and complexity. Also, relations between methods are depicted. The goal of this work is providing researches on this field with some discussion as well as criteria decision to choose a method according to the user's needs and/or keeping a good trade-off between performance and required processing time.
The aim of this paper is to propose a new generalized formulation for feature extraction based on distances from a feature relevance point of view. This is done within an unsupervised framework. To do so, it is first outlined the formal concept of feature relevance. Then, a novel feature extraction approach is introduced. Such an approach employs the M-norm as a distance measure. It is demonstrated that under some conditions, this method can readily explain literature methods. As another contribution of this paper, we propose an elegant feature ranking approach for feature selection followed from the spectral analysis of the data variability. Also, we provide a weighted PCA scheme revealing the relationship between feature extraction and feature selection. To assess the behavior of the studied methods within a pattern recognition system, a clustering stage is carried out. Normalized mutual information is used to quantify the quality of resultant clusters. Proposed methods reach comparable results with respect to literature methods.
Dimensionality reduction methods aimed at preserving the data topology have shown to be suitable for reaching high-quality embedded data. In particular, those based on divergences such as stochastic neighbour embedding (SNE). The big advantage of SNE and its variants is that the neighbor preservation is done by optimizing the similarities in both high- and low-dimensional space. This work presents a brief review of SNE-based methods. Also, a comparative analysis of the considered methods is provided, which is done on important aspects such as algorithm implementation, relationship between methods, and performance. The aim of this paper is to investigate recent alternatives to SNE as well as to provide substantial results and discussion to compare them.
This work describes a novel quadratic formulation for solving the normalized cuts-based clustering problem as an alternative to spectral clustering approaches. Such formulation is done by establishing simple and suitable constraints, which are further relaxed in order to write a quadratic functional with linear constraints. As a meaningful result of this work, we accomplish a deterministic solution instead of using a heuristic search. Our method reaches comparable performance against conventional spectral methods, but spending significantly lower processing time.
This work introduces a multiple kernel learning (MKL) approach for selecting and combining different spectral methods of dimensionality reduction (DR). From a predefined set of kernels representing conventional spectral DR methods, a generalized kernel is calculated by means of a linear combination of kernel matrices. Coefficients are estimated via a variable ranking aimed at quantifying how much each variable contributes to optimize a variance preservation criterion. All considered kernels are tested within a kernel PCA framework. The experiments are carried out over well-known real and artificial data sets. The performance of compared DR approaches is quantified by a scaled version of the average agreement rate between K-ary neighborhoods. Proposed MKL approach exploits the representation ability of every single method to reach a better embedded data for both getting more intelligible visualization and preserving the structure of data.
This work presents a novel alternative to conventional linear homotopy with suboptimal settings for applications on object deformation. Proposed approach extends the linear mapping to exponential representations that provides smooth transitions when deforming objects while homotopy conditions are fulfilled. As well, we introduce a quality indicator based on the ratio between the coefficients curve of resultant homotopy and those of a non-realistic, reference homotopy. Experimental results are promising and show the applicability of exponential homotopy to interpolating images with soft changes and homotopic geometric objects.
Dimensionality reduction (DR) methods represent a suitable alternative to visualizing data. Nonetheless, most of them still lack the properties of interactivity and controllability. In this work, we propose a data visualization interface that allows for user interaction within an interactive framework. Specifically, our interface is based on a mathematic geometric model, which combines DR methods through a weighted sum. Interactivity is provided in the sense that weighting factors are given by the user via the selection of points inside a geometric surface. Then, (even non-expert) users can intuitively either select a concrete DR method or carry out a mixture of methods. Experimental results are obtained using artificial and real datasets, demonstrating the usability and applicability of our interface in DR-based data visualization.
Within the context of epileptic sources localization from electroencephalographic signals, this work presents an exploratory study aimed at studying the effect of channel weighting on the estimation of the inverse problem solution. In this study, we consider two weighting approaches followed from a relevance feature analysis based on variance and energy criteria. Such approaches are compared by measuring the difference between the estimated source activity and the true power of the simulated sources in terms of the Earth mover's distance. Experimental results show that the incorporation of proper weighting factors into a LORETA-driven solution, localization may be improved. As well, the physiological phenomenon of the brain activity is more precisely tracked.
This paper reviews some recent and classical, relevant works on information visualization with a special focus on those applied to big data. The central idea dealt in this work relies on how to perform data mining tasks in a visual fashion; that is, using graphical correlation and interaction techniques. The scope of this review encompasses visualization techniques, formal visualization systems, and smart information visualization models. As well, newest approaches consisting of visualization and data mining integration process are explained.
This work presents an approach allowing for an interactive visualization of dimensionality reduction outcomes, which is based on an extended view of conventional homotopy. The pairwise functional followed from a simple homotopic function can be incorporated within a geometrical framework in order to yield a bi-parametric approach able to combine several kernel matrices. Therefore, the users can establish the mixture of kernels in an intuitive fashion by only varying two parameters. Our approach is tested by using kernel alternatives for conventional methods of spectral dimensionality reduction such as multidimensional scalling, locally linear embedding and laplacian eigenmaps. The proposed mixture represents every single dimensionality reduction approach as well as helps users to find a suitable representation of embedded data.
Spectral clustering has shown to be a powerful technique for grouping and/or rank data as well as a proper alternative for un-labeled problems. Particularly, it is a suitable alternative when dealing with pattern recognition problems involving highly hardly separable classes. Due to its versatility, applicability and feasibility, this clustering technique results appealing for many applications. Nevertheless, conventional spectral clustering approaches lack the ability to process dynamic or time-varying data. Within a spectral framework, this work presents an overview of clustering techniques as well as their extensions to dynamic data analysis.
The large amount of data generated by different activities-academic, scientific, business and industrial activities, among others contains meaningful information that allows developing processes and techniques, which have scientific validity to optimally explore such information. Doing so, we get new knowledge to properly make decisions. Nowadays a new and innovative field is rapidly growing in importance that is Artificial Intelligence, which involves computer processing devices of modern machines and human reasoning. By synergistically combining them in other works, performing an integration of natural and artificial intelligence-, it is possible to discover knowledge in a more effective way in order to find hidden trends and patterns belonging to the predictive model database. As well, allowing for new observations and considerations from beforehand known data by using data analysis methods as well as the knowledge and skills (of holistic, flexible and parallel type) from human reasoning. This work briefly reviews main basics and recent works on artificial and natural intelligence integration in order to introduce users and researchers on this field integration approaches. As well, key aspects to conceptually compare them are provided.
This work presents an approach for modelling cardiac pulse from electrocardiographic signals (ECG). We explore the use of the Bonhoeffer-van der Pol (BVP) model-being a generalized version of the van der Pol oscillator which, under proper parameters, is able to describe action potentials, and it can be then adapted to modelling normal cardiac pulse. Using basics of non-linear dynamics and some algebra, the BVP system response is estimated. To account for an adaptive response for every single heartbeat, we propose a parameter tuning method based on a heuristic search in order to yield responses that morphologically resemble real ECG. This aspect is important since heartbeats have intrinsically strong variability in terms of both shape and length. Experiments are carried out over real ECG from MIT-BIH arrhythmias database. We perform a bifurcation and phase portrait analysis to explore the relationship between non-linear dynamics features and pathology. Preliminary results provided here are promising showing some hints about the ability of non-linear systems modelling ECG to characterize heartbeats and facilitate the classification thereof, being latter very important for diagnosing purposes.
This work presents the design of a low-cost biofeedback prototype based on the heart rate estimated from photoplethysmography. Proposed prototype involves mainly two building blocks: The first one is concerned on the signal acquisition and analog filtering. Meanwhile, the second one is the estimation of heart rata and visual feedback implemented in an Arduino platform. Experimental results and tests show its usability keeping the goal of involving design low costs and portability.
This paper presents the development of a unified view of spectral clustering and unsupervised dimensionality reduction approaches within a generalized kernel framework. To do so, the authors propose a multipurpose latent variable model in terms of a high-dimensional representation of the input data matrix, which is incorporated into a least-squares support vector machine to yield a generalized optimization problem. After solving it via a primal-dual procedure, the final model results in a versatile projected matrix able to represent data in a low-dimensional space, as well as to provide information about clusters. Specifically, our formulation yields solutions for kernel spectral clustering and weighted-kernel principal component analysis.
Nowadays, a consequence of data overload is that world's technology capacity to collect, communicate, and store large volumes of data is increasing faster than human analysis skills. Such an issue has motivated the development of graphic ways to visually represent and analyze high-dimensional data. Particularly, in this work, we propose a graphical interface that allow the combination of dimensionality reduction (DR) methods using a chromatic model to make data visualization more intelligible for humans. This interface is designed for an easy and interactive use, so that input parameters are given by the user via the selection of RGB values inside a given surface. Proposed interface enables (even non-expert) users to intuitively either select a concrete DR method or carry out a mixture of methods. Experimental results proves the usability of our interface making the selection or configuration of a DR-based visualization an intuitive and interactive task for the user.
Spectral clustering is a suitable technique to deal with problems involving unlabeled clusters and having a complex structure, being kernel-based approaches the most recommended ones. This work aims at demonstrating the relationship between a widely-recommended method, so-named kernel spectral clustering (KSC) and other well-known approaches, namely normalized cut clustering and kernel k-means. Such demonstrations are done by following a primal-dual scheme. Also, we mathematically and experimentally prove the usability of using LS-SVM formulations with a model. Experiments are conducted to assess the clustering performance of KSC and the other considered methods on image segmentation tasks.
The research area of sitting-pose analysis allows for preventing a range of physical health problems mainly physical. Despite that different systems have been proposed for sitting-pose detection, some open issues are still to be dealt with, such as: Cost, computational load, accuracy, portability, and among others. In this work, we present an alternative approach based on a sensor network to acquire the position-related variables and machine learning techniques, namely dimensionality reduction (DR) and classification. Since the information acquired by sensors is high-dimensional and therefore it might not be saved into embedded system memory, a DR stage based on principal component analysis (PCA) is performed. Subsequently, the automatic posed detection is carried out by the k-nearest neighbors (KNN) classifier. As a result, regarding using the whole data set, the computational cost is decreased by 33 % as well as the data reading is reduced by 10 ms. Then, sitting-pose detection task takes 26 ms, and reaches 75% of accuracy in a 4-trial experiment.
Business Intelligence (BI) es el conjunto de estrategias y herramientas para analizar gran cantidad de volúmenes de datos con el fin de encontrar patrones o tendencias de consumo de las personas y establecer estrategias de negocio, para lograr este objetivo es necesario contar con servicios y aplicaciones como RealTime BI, Social BI, Cloud BI, BI 3.0, Business Analitics y Mobile BI. Todo el proceso de BI es soportado por diferentes análisis que implementan algoritmos de machine learning en grandes volúmenes y diferentes fuentes de datos, considerado como Big Data. En este trabajo se hace relación a las funcionalidades y los requerimientos necesarios de BI desde un concepto inicial hasta conceptos específicos y herramientas para su implementación..
La implementación de una red inalámbrica de sensores (WSN) a través del Protocolo de Internet versión 6 (IPv6), permite el monitoreo remoto y en tiempo real de factores ambientales de los cultivos de ciclo corto en la granja La Pradera de la Universidad Técnica del Norte. Al contar con redes inalámbricas de sensores es beneficioso debido a sus costos asequibles, mantenimiento económico y bajo consumo de potencia, siendo este último la característica que permite tener una red escalable, dado que los nodos sensores pueden ser alimentados con energías verdes como es el caso de paneles solares y así monitorear mayores áreas, distancias y en tiempo real. Contar con un monitoreo en tiempo real de los factores ambientales permite al administrador de los cultivos tener información confiable como base para la toma de decisiones, posiblemente para programar un riego controlado; el beneficio crece aún más si se aprovecha la existencia de plataformas como servicio en la nube (PAAS) que permiten visualizar estos datos desde cualquier dispositivo inteligente con acceso a Internet mediante un navegador web.
En el campo de visualización de la información (VI) en Big Data (también denominado DataVis, InfoVis, Analítica Visual, VA), se han realizado innumerables esfuerzos, a nivel empresarial, educación e investigación, entre otros, que han dado como resultado diversas propuestas de herramientas de software que usan interfaces y técnicas de VI. En la actualidad, existen decenas de herramientas que potencializan y se especializan en determinadas técnicas de visualización. Por tanto, para un usuario, la elección de una herramienta en particular no es una tarea trivial. En este trabajo, se presenta un estudio descriptivo de técnicas de visualización de información abarcando diferentes grupos o tipos de técnicas, tales como: Geometric Projection, Interactive, Icon-based, y Hierarchical, entre otros. Para este fin, se realiza una tabulación de información, presentando las herramientas de software y técnicas de visualización consideradas en este estudio, de forma que pueda realizarse la identificación de las técnicas más comúnmente utilizadas y recomendadas para uso en entornos de tipo Open Source y Soluciones Empresariales. Para ello, se parte de la revisión de análisis de literatura de VI, Analítica Visual y artículos científicos sobre herramientas de análisis de Big Data enfocados en establecer herramientas de software y técnicas de visualización. Dichos análisis y revisiones se realizan sobre un total de 58 técnicas de visualización y 31 herramientas de software. Como resultado, se obtiene una valoración de técnicas de visualización y se establece aspectos clave y recomendaciones para realizar la selección de técnicas de visualización de acuerdo con los requerimientos del usuario.
Case-based reasoning (CBR) is a problem solving approach that uses past experience to tackle current problems. CBR has demonstrated to be appropriate for working with unstructured domains data or difficult knowledge acquisition situations, as it is the case of the diagnosis of many diseases. Some of the trends and opportunities that may be developed for CBR in the health science are oriented to reduce the number of features in highly dimensional data, as well as another important focus on how CBR can associate probabilities and statistics with its results by taking into account the concurrence of several ailments. In this paper, in order to adequately represent the database and to avoid the inconveniences caused by the high dimensionality, a number of algorithms are used in the preprocessing stage for performing both variable selection and dimension reduction procedures. Subsequently, we make a comparative study of multi-class classifiers. Particularly, four classification techniques and two reduction techniques are employed to make a comparative study of multi-class classifiers on CBR.
Internet de las cosas (o también conocido como IoT) es una de las tecnologías más nombradas en la actualidad debido a la capacidad que prevé para conectar todo tipo de dispositivos al Internet, si a las potencialidades de IoT le adicionamos otra tecnología de alto impacto como lo es la Visión Artificial tenemos un amplio campo de aplicaciones innovadoras, donde el procesamiento de imágenes y video en tiempo real permiten la visualización de grandes cantidades de datos en internet. Las principales aplicaciones que se desarrollan con IoT y Visión Artificial pueden ser implementadas en educación, medicina, edificios inteligentes, sistemas de vigilancia de personas y vehículos entre otros que permiten mejorar la calidad de vida de los usuarios. Para el desarrollo de estas aplicaciones se requiere una infraestructura que permita la convergencia de diferentes protocolos y dispositivos, pero de manera especial que puedan manejar las diferentes fases de la adquisición de imágenes. En este trabajo se ha realizado una revisión de los inicios, conceptos, tecnologías y aplicaciones ligados a la Visión Artificial con el Internet de las Cosas para poder comprender de forma precisa el impacto de su aplicación en la vida cotidiana.
Unsupervised pattern recognition analysis is the most used approach to grouping heartbeats of electrocardiographic recordings or electrocardiograms (ECGs). This is due to the fact that beats labeling are very often not available. Given that detection of some transient and infrequent arrhythmias is unfeasible in a short-time ECG test, ambulatory electrocardiography is required. In this paper, a design of a complete system for the identification of arrhythmias using unsupervised pattern recognition techniques is proposed. Particularly, our system involves stages for signal preprocessing, heartbeat segmentation and characterization, features selection, and clustering. All these stages are developed within a segment clustering framework, which is a suitable alternative to detect minority classes. As average performance, including five types of arrhythmia, our system reaches 91,31% and 99,16% for sensitivity and specificity, respectively.
In this study, two representative algorithms from evolutionary computation are applied to a simple multiobjective test problem with a convex optimal Pareto set in order to identify which one is more efficient finding from the searching space a set of optimal solutions. We describe how these methods inspired by nature works and how they try to find the Pareto set. The results are tested with the metrics generational distance an error rate to determine the efficiency
This work presents a new method proposal applied to Multi-Labelers scenarios. This is a situation where labelling individuals in a set of data based on certain characteristics in the process of determining labels to individuals in a set of data based on certain characteristics. Our approach consists in processing a Support Vector Machine classifier to each labelers substantiated on his answers. We formulate a genetic algorithm optimization to obtain a set of weights according to their opinion, in order to penalize each panelist. Finally, their resulting mappings are mixed, and a final classifier is generated, showing to be better than majority vote. For experiments, the well-known Iris database is handled, with multiple simulated artificial labels. The proposed method reaches very good results compared to conventional multi-labeler methods, able to assess the concordance among panelists considering the structure data.
The pre-term pregnancy occurs when labor occurs before 37 weeks of gestation, this fact is a major cause of mortality and morbidity in children, at present. Despite that there are several factors that indicate risk a pre-term delivery, it can be produced without the need of a symptom or indication factor. In the world it is estimated that around 15 million premature babies was born each year, this quantity is growing and also has a greater impact in developing countries. That is why several investigations aimed at solving this problem through a study of records of uterine electrical activity, known as electrohisterography, which represents a great hope when detecting a pre-term pregnancy. By using computerized systems based on techniques of machine learning it is possible to determine the probability of pregnancy preterm as from electrohisterography records, however, there is still no definitive methods to characterize and classify these records This article presents a comparison methodology for the diagnosis of pre-term pregnancy occurs using different supervised Pattern recognition techniques such as feature selection, dimensionality reduction and classification. Considered techniques and reach an average error of 18.75%.
Este trabajo presenta un estudio comparativo con métodos de reducción de la dimensión lineal, tales como: Análisis de Componentes Principales & Análisis Discriminante Lineal. El estudio pretende determinar, bajo criterios de objetividad, cuál de estas técnicas obtiene el mejor resultado de separabilidad entre clases. Para la validación experimental se utilizan dos bases de datos, del repositorio científico (UC Irvine Machine Learning Repository), para dar tratamiento a los atributos del data-set en función de confirmar visualmente la calidad de los resultados obtenidos. Las inmersiones obtenidas son analizadas, para realizar una comparación de resultados del embedimiento representados con RNX(K), que permite evaluar el área bajo la curva, del cual asume una mejor representación en una topología global o local que posteriormente genera los gráficos de visualización en un espacio de menor dimensión, para observar la separabilidad entre clases conservando la estructura global de los datos.
The atrial fibrillation (AF) is the most common arrhythmia, which generates the highest costs on clinical systems. Theory of the rotor is one of the most recent approaches to explain the mechanisms that maintain AF. The most promising treatment is the ablation, whose success depends on rotor tip location. In a previous research, the approximate entropy (ApEn) calculated on simulated electrograms from atrial models has shown high capability for detecting the rotor tip, however it needed a human final adjustment. In addition, this technique involves a high computational cost, which is a problem for its effective application. In this study, multiple features maps were generated and different combinations of them were conducted using wavelet image fusion. The rotor tip location when using image fusion, was similar to the results achieved with the methodology based on ApEn, however, our methodology did not require any manual adjustment, and the computational cost was reduced to 85%. This study includes a comparative analysis between unipolar and bipolar electrograms obtained from a simulated 2D model of a human atrial tissue under chronic AF.
This work presents a dimensionality reduction (DR) framework that enables users to perform either the selection or mixture of DR methods by means of an interactive model, here named Geo-Desic approach. Such a model consists of linear combination of kernel-based representations of DR methods, wherein the corresponding coefficients are related to coordinated latitude and longitude inside of the world map. By incorporating the Geo-Desic approach within an interface, the combination may be made easily and intuitively by users –even non-expert ones– fulfilling their criteria and needs, by just picking up points from the map. Experimental results demonstrates the usability and ability of DR methods representation of proposed approach.
This work presents a new interactive data visualization approach based on mixture of the outcomes of dimensionality reduction (DR) methods. Such a mixture is a weighted sum, whose weighting factors are defined by the user through a visual and intuitive interface. Additionally, the low-dimensional representation space produced by DR methods are graphically depicted using scatter plots powered via an interactive data-driven visualization. To do so, pairwise similarities are calculated and employed to define the graph to be drawn on the scatter plot. Our visualization approach enables the user to interactively combine DR methods while provided information about the structure of original data, making then the selection of a DR scheme more intuitive.
This work introduces a multi-labeler kernel novel approach for data classification learning from multiple labelers. The learning process is done by training support-vector machine classifiers using the set of labelers (one labeler per classifier). The objective functions representing the boundary decision of each classifier are mixed by means of a linear combination. Followed from a variable relevance, the weighting factors are calculated regarding kernel matrices repre- senting each labeler. To do so, a so-called supervised kernel function is also in- troduced, which is used to construct kernel matrices. Our multi-labeler method reaches very good results being a suitable alternative to conventional approaches.
Nowadays, an important decrease in the quality of the air has been observed, due to the presence of contamination levels that can change the natural composition of the air. This fact represents a problem not only for the environment, but also for the public health. Consequently, this paper presents a comparison among approaches based on Adaptive Neural Fuzzy Inference System (ANFIS) and Support Vector Regression (SVR) for the estimation level of PM2.5 (Particle Material 2.5) in specific geographic locations based on nearby stations. The systems were validated using an environmental database that belongs to air quality network of Valle de Aburrá (AMVA) of Medellin Colombia, which has the registration of 5 meteorological variables and 2 pollutants that are from 3 nearby measurement stations. Therefore, this project analyses the relevance of the characteristics obtained in every single station to estimate the levels of PM2.5 in the target station, using four different selectors based on Rough Set Feature Selection (RSFS) algorithms. Additionally, five systems to estimate the PM2.5 were compared: three based on ANFIS, and two based on SVR to obtain an aim and an efficient mechanism to estimate the levels of PM2.5 in specific geographic locations fusing data obtained from the near monitoring stations.
Este artículo presenta el desarrollo de un Plan de Continuidad de Negocio para ser aplicado al Centro de datos de la Fiscalización del proyecto Hidroeléctrico Coca Codo Sinclair, a fin de garantizar la continuidad de los servicios y/o procesos tecnológicos asegurando la protección de los recursos y las operaciones primordiales de dicho proyecto. El plan de continuidad de negocios propuesto se basa en la norma ISO 22301; 2012, Seguridad de la Sociedad, Sistema de Gestión de Continuidad de Negocio y el Estándar BCLS 2000 de la organización DRI International, los cuales se rigen por principios para la toma decisiones apropiadas al momento de sufrir un desastre de tipo natural o algún imprevisto. Metodológicamente, se comienza por realizar un diagnóstico situacional de la organización, y un análisis respecto a la seguridad de la información del centro de datos. Subsecuentemente, se hace una evaluación de criterios liderazgo como: planificación, apoyo, funcionamiento y planeación estratégica. Seguidamente, se identifica lo riesgos naturales y amenazas existentes. Con base en los resultados obtenidos de los análisis y evaluaciones, se plantea el modelo de continuidad de negocios según lo manifestado en la norma ISO 22301. Asimismo, a partir de un conocimiento sobre el contexto de la Asociación, se establece la planificación, liderazgo, apoyo, políticas, estrategia y procedimientos de continuidad. Dado que el diseño del plan de continuidad realizado comprende todos los aspectos que considera la norma ISO 22301 y abarca los resultados de los análisis de la Asociación, éste podría considerarse como un modelo versátil y de implementación factible.
La Universidad Técnica del Norte a través de la carrera de ingeniería en Electrónica y Redes de Comunicación, fortaleciendo al proceso de vinculación con la sociedad, crea un proyecto que beneficia a la población de las zonas rurales de la provincia de Imbabura, y que, particularmente, tiene el objetivo de disminuir la brecha digital en colegios, brindado acceso a las tecnologías de información y comunicación. Para poner en acción el proyecto de mejoramiento de las capacidades tecnológicas, se consideró como población de interés a los individuos mayores de 12 años, hombres y mujeres, de diferentes niveles sociales, culturales y étnicos. Metodológicamente, se plantea una planificación de programas de capacitación en base a un diagnóstico de las necesidades, que posteriormente fueron evaluados con el propósito de reducir las deficiencias tecnológicas de las personas y así mejorar la calidad de vida con miras a reducir el analfabetismo digital. Dentro de la ejecución del proyecto, se toma como estrategia los Infocentros Comunitarios y propios laboratorios de computación de las instituciones educativas, los mismos que son punto de referencia para llegar a las comunidades, logrando capacitar en herramientas de ofimática básica, asistencias de redes de comunicación, Internet, mantenimiento de computadoras, y administración de páginas web. Posteriormente a la capacitación, claramente se pudo observar que, en los sectores de acción, satisficieron sus expectativas, adquirieron habilidades y destrezas de temas tecnológicos. En este artículo se presenta los aspectos metodológicos y resultados más importantes del proyecto.
Este artículo describe un proyecto que se enfoca en fortalecer las capacidades intelectuales y técnicas de las EPS (Economía Popular y Solidaria) y demás actores, mediante la asistencia en la utilización de herramientas informáticas, de procesos técnicos y capacitaciones de electrónica y redes de comunicación en instituciones educativas de la zona 1.
This paper shows the increased security risks in recent years, attacks on authentication controls specifically passwords. Methods and techniques for violating such controls, with emphasis on techniques using brute force attacks with data dictionary; plus a proof of concept outlined using a free distribution of Linux " Kali Linux " and penetration tools " Metaexploits Framework ", to finally raise the most commonly used controls to mitigate such attacks with a more detailed use approach and management secure passwords as one of the main controls used in Ecuador in times of crisis.
El artículo presenta un sistema de monitoreo de monóxido de carbono que provea un ambiente seguro en una residencia, basado en una red inalámbrica de sensores y la plataforma como un servicio en la nube con el objetivo de cuidar de las vidas humanas y evitar la intoxicación por gas.
El presente estudio involucra una nueva solución de las telecomunicaciones para el servicio de telefonía IP el cual es a través del Cloud Computing, con el que se permite obtener una opción de comunicaciones para la Universidad Técnica del Norte. Se efectuó el análisis en base a la norma IEEE 29148 para la selección de la plataforma de virtualización y el software de telefonía IP. En el diseño de la telefonía IP se desarrolló el dimensionamiento a través del cloud, se consideró la capacidad de la instancia, ancho de banda, flujo de tráfico y el número de troncales. Para el análisis de estos parámetros se obtuvo información actualizada que proporcionó la Universidad sobre telefonía IP y se procedió a configurar en la plataforma de Cloud. Se realizaron pruebas de funcionamiento del servicio para observar el comportamiento que este tiene a través de esta infraestructura.
This work outlines a unified formulation to represent spectral approaches for both dimensionality reduction and clustering. Proposed formulation starts with a generic latent variable model in terms of the projected input data matrix. Particularly, such a projection maps data onto a unknown high-dimensional space. Regarding this model, a generalized optimization problem is stated using quadratic formulations and a least-squares support vector machine. The solution of the optimization is addressed through a primal-dual scheme. Once latent variables and parameters are determined, the resultant model outputs a versatile projected matrix able to represent data in a low-dimensional space, as well as to provide information about clusters. Particularly, proposed formulation yields solutions for kernel spectral clustering and weighted-kernel principal component analysis.
En la actualidad se puede evidenciar un crecimiento exponencial del volumen de datos, dando lugar al área emergente denominada Big Data. Paralelamente a este crecimiento, ha aumentado la demanda de herramientas, técnicas y dispositivos para almacenar, transmitir y procesar datos de alta dimensión. La mayoría de metodologías existentes para procesar datos de alta dimensión producen resultados abstractos y no envuelven al usuario en la elección o sintonización las técnicas de análisis. En este trabajo proponemos una metodología de análisis visual de Big Data con principios de interactividad y controlabilidad de forma que usuarios (incluso aquellos no expertos) puedan seleccionar intuitivamente un método de reducción de dimensión para generar representaciones inteligibles para el ser humano.
La eliminación desordenada de aceite por parte de hogares y restaurantes, así como el desaprovechamiento del aceite usado de cocinas se han convertido en causas transcendentales de contaminación. Asimismo, si no se recicla adecuadamente el aceite usado, la demanda de diésel fósil seguirá prevaleciendo. Esto obedece, entre otras razones, al desconocimiento de las posibilidades y beneficios de la transformación industrial del aceite para la generación de biodiesel. Una etapa determinante de la generación de biodiesel es la transesterificación, proceso que se desarrolla, tradicionalmente, de forma manual y heurística, razón por la cual está propenso a errores humanos. Esta investigación presenta el diseño de un prototipo para la automatización del proceso de transesterificación a través de un control automático de pH y temperatura.
Este trabajo presenta el proceso de diseño e implementación de un dispositivo electrónico de bajo costo y portable, basado en la estimación de la actividad mecánica del corazón mediante técnicas pletismográficas. El dispositivo propuesto está compuesto por cuatro etapas comprendidas entre la adquisición de la señal mediante fotopletismografía, su adecuación mediante filtros analógicos, el procesamiento de la señal y estimación del ritmo cardiaco mediante un algoritmo implementado en una tarjeta arduino y la interfaz hombre-dispositivo con la que se realiza un proceso de realimentación visual y auditiva. Las pruebas y los resultados experimentales muestran la aplicabilidad del dispositivo manteniendo su bajo costo y portabilidad.
En la actualidad, el desarrollo de dispositivos basados en el seguimiento de mirada (Eye-tracking) ha experimentado una gran acogida en diversos campos, tanto de investigación como de aplicación, muestra de ello es el crecimiento en la realización de estudios investigativos en áreas como la medicina, la educación, el marketing, entre otras. En este documento, se elabora una reseña descriptiva sobre el desarrollo y aplicación que los dispositivos eye tracking han experimentado recientemente basada en la revisión de artículos de investigación y webs especializadas que tratan el tema.
La aplicación de la Tecnología en todos los campos de la sociedad ha incrementado a partir de la década de los 90, y en el sector agrícola ha traído enormes ventajas permitiendo aumentar la calidad y productividad en los cultivos. La ineficiencia e ineficacia de los métodos tradicionales de control y monitoreo empleados en la agricultura, generan pérdidas de tiempo y costos para los agricultores. En este artículo se describe el diseño, elaboración e implementación de un sistema de riego empleando hardware y software libre, redes de sensores inalámbricas (WSN), actuadores, dispositivos de comunicación inalámbrica y herramientas TICs, con el fin de crear un ambiente donde el Internet de las Cosas (IoT) y la Agricultura de Precisión ofrezcan al usuario un mejor control del riego sobre el cultivo teniendo en cuenta la evapotranspiración.
Este sistema de posicionamiento, como ayuda en la lucha contra el crimen se basa en tecnología de fácil manejo y bajo costo de fabricación, en un procedimiento de encriptación por ciclos configurable por el usuario utilizando técnicas como machine learning y minería de datos. Este dispositivo dará la oportunidad de conocer quiénes son los agentes policiales asignados a cada sector y además se garantizara en un 100% la supervisión del personal que se desempeña en una institución dedicada a la seguridad de los ciudadanos. El plan establece la creación de dos centros operacionales que funcionarán, uno en cada cuadrante y otro en las instalaciones de monitoreo, apoyados en técnicas de procesamiento de la información para vigilar las acciones de trabajo impuestas. Se dice que un estricto proceso de control de los recursos humanos y materiales con ayuda de las nuevas tecnologías, que son el sistema de posicionamiento global satelital (GPS), analisis de datos y sistemas biométricos, daran como resultado, seguridad y control en tiempo real, durante las 24 horas del día, de los recursos y el personal quien los use. Con estos modelos de experimentacion se obtendra un nuevo conocimiento, dispuesto a mejorar los procesos.
This project addresses the solution of Multi Depot Vehicle Routing Problem (MDVRP) involving fuel consumption, this problem is considered NP-hard, so it is solved by a hybrid algorithm that minimizes the costs of distance and fuel consumption, within a reasonable computational time. Client grouping and allocation to deposits made by applying two procedures in order to form the initial population, the first procedure is to assign first and routed after using ellipses, the routes are programmed and optimized using genetic algorithms. The performance of the algorithm is evaluated by conducting different runs and comparing the results obtained with the bodies designed by Cordeau, in order to apply the methodology of solution test case distribution of dairy products a company of San Juan.
The physical limitations that many people have as a result of degenerative neuronal diseases and limb amputations, have resulted in extensive research trying to understand the intentions of the brain through the study of characteristics of EEG signals, with the purpose of use them in output commands and can perform tasks that improve the quality of life of people. Although currently exist BCI systems that somehow allow users to interact with certain devices, these still presenting significant limitations because there is no definitive methodology that allows the correct characterization and classification of EEG signals, this being a research topic still open. The main idea of this work is to present an alternative methodology for feature extraction of EEG signals a view to improving the current systems BCI and achieve a friendly user interface.
En este trabajo se realiza un estudio estadístico para la comparación de las calificaciones obtenidas en tres periodos consecutivos de tres grupos de estudiantes de grado séptimo de educación media, para lo cual se realiza un diseño de experimentos unifactorial en el que se determina si el tener clases de matemáticas en las últimas horas de la jornada académica influye en el rendimiento medido con las calificaciones obtenidas en los tres grupos de estudiantes con el mismo docente.
In the operation of logistics service companies and distribution, traffic congestion becomes a problem to be considered, because of their effect on arrival times to meet to end users, in addition to routing costs. These costs may decrease making travel plans, where traffic congestion in urban centers are avoided. This paper proposes an experimental design to estimate congestion costs as input to solve the vehicle routing problem with time dependent (TDVRP). As strategy solution, it intends a genetic algorithm hybrid with population management. The data validation is performed considering a Dairy company making the distribution of its products in the store channel in Pasto, obtaining a 35% reduction in logistics costs of distribution of final product.
Los enormes volúmenes de datos, generados por la actividad académica, científica, empresarial e industrial, entre muchas más, contienen información muy valiosa, lo que hace necesario desarrollar procesos y técnicas robustas, de validez científica que permitan explorar esas grandes cantidades de datos de manera óptima, con el propósito de obtener información relevante para la generación de nuevo conocimiento y toma de decisiones acertadas. La robustez y altas capacidades de procesamiento computacional de las maquinas modernas son aprovechadas por áreas como la inteligencia artificial que si se integra de forma holística con la inteligencia natural, es decir, si se combina sinérgicamente los métodos sofisticados de análisis de datos con los conocimientos, habilidades y flexibilidad de la razón humana, es posible generar conocimiento de forma más eficaz. La visualización de información propone formas eficientes de llevar los resultados generados por los algoritmos a la comprensión humana, la cual permite encontrar tendencias y patrones ocultos de forma visual, los cuales pueden formar la base de modelos predictivos que permitan a los analistas producir nuevas observaciones y consideraciones a partir de los datos existentes, mejorando el desempeño de los sistemas de aprendizaje automático, haciendo más inteligibles los resultados y mejorando la interactividad y controlabilidad por parte del usuario. Sin embargo, la tarea de presentar y/o representar datos de manera comprensible, intuitiva y dinámica, no es un tarea trivial; uno de los mayores problemas que enfrenta la visualización es la alta dimensión de los datos, entendiendo dimensión o como el número de variables o atributos que caracterizan a un objeto. Una solución efectiva son los métodos de reducción de dimensión (RD) que permiten representar los datos originales en alta dimensión en dimensiones inteligibles para el ser humano (2D o 3D). En la actualidad, los métodos kernel representan una buena alternativa de RD debido su versatilidad y fácil implementación en entornos de programación. En este trabajo se presenta una breve descripción y forma de uso de un método generalizado conocido como análisis de componentes principales basado en kernel (KPCA).
Una arritmia es una patología cardiaca que consiste en la alteración de los latidos del corazón, que se produce debido al cambio de la frecuencia cardiaca. . Existen algunas arritmias que por su naturaleza infrecuente y transitoria, son de difícil detección con electrocardiografía de corta duración, por tanto se recurre a la electrocardiografía ambulatoria, que permite evaluar al paciente durante largos periodos de tiempo sin interferir en su actividad diaria, para ello se utiliza un dispositivo de adquisición y almacenado de registros ECG conocido con el nombre de monitor Holter. Para agrupar los latidos representativos de un registro Holter, se pueden recurrir a técnicas de análisis supervisado o no supervisado, siendo la segunda opción la más recomendada, porque logra reducir la cantidad de latidos que debe revisar el especialista sin necesitar de un conjunto de latidos conocidos para clasificar nuevos latidos. Pese a la existencia de técnicas que han resultado ser muy útiles, el diseñar un sistema más robusto frente a factores como los distintos tipos de ruido, la gran cantidad de latidos, las clases minoritarias y la variabilidad morfológica entre distintos pacientes es aún un problema abierto. En este trabajo se propone el diseño de un sistema completo para la identificación de arritmias usando técnicas no supervisadas de representación y clustering, el sistema contempla etapas de segmentación de latidos, caracterización, representación, evaluación de la sensibilidad del número de grupos, clustering y evaluación de desempeño. Se utilizan características morfológicas y espectrales que generan separabilidad entre arritmias cardiacas. Las pruebas se realizan sobre la base de datos MIT/BIH que incluye 48 registros con las arritmias recomendadas por la AAMI (Association for the advanced of medical instrumentation ) tales como: latidos normales (N), latidos auriculares prematuros (A), contracciones ventriculares prematuras (V), bloqueo de la rama derecha del corazón (R) y bloqueo de la rama izquierda del corazón (L). Se obtienen resultados en promedio de sensibilidad y especifidad. Los latidos se clasifican correctamente con un porcentaje de 85.88% para la sensibilidad, 98.69% para especifidad y en general una eficiencia del 95.04%.
Una arritmia es una patología cardiaca que consiste en la alteración de los latidos del corazón, que se produce debido al cambio de la frecuencia cardiaca. . Existen algunas arritmias que por su naturaleza infrecuente y transitoria, son de difícil detección con electrocardiografía de corta duración, por tanto se recurre a la electrocardiografía ambulatoria, que permite evaluar al paciente durante largos periodos de tiempo sin interferir en su actividad diaria, para ello se utiliza un dispositivo de adquisición y almacenado de registros ECG conocido con el nombre de monitor Holter. Para agrupar los latidos representativos de un registro Holter, se pueden recurrir a técnicas de análisis supervisado o no supervisado, siendo la segunda opción la más recomendada, porque logra reducir la cantidad de latidos que debe revisar el especialista sin necesitar de un conjunto de latidos conocidos para clasificar nuevos latidos. Pese a la existencia de técnicas que han resultado ser muy útiles, el diseñar un sistema más robusto frente a factores como los distintos tipos de ruido, la gran cantidad de latidos, las clases minoritarias y la variabilidad morfológica entre distintos pacientes es aún un problema abierto. En este trabajo se propone el diseño de un sistema completo para la identificación de arritmias usando técnicas no supervisadas de representación y clustering, el sistema contempla etapas de segmentación de latidos, caracterización, representación, evaluación de la sensibilidad del número de grupos, clustering y evaluación de desempeño. Se utilizan características morfológicas y espectrales que generan separabilidad entre arritmias cardiacas. Las pruebas se realizan sobre la base de datos MIT/BIH que incluye 48 registros con las arritmias recomendadas por la AAMI (Association for the advanced of medical instrumentation ) tales como: latidos normales (N), latidos auriculares prematuros (A), contracciones ventriculares prematuras (V), bloqueo de la rama derecha del corazón (R) y bloqueo de la rama izquierda del corazón (L). Se obtienen resultados en promedio de sensibilidad y especifidad. Los latidos se clasifican correctamente con un porcentaje de 85.88% para la sensibilidad, 98.69% para especifidad y en general una eficiencia del 95.04%.
Case-based reasoning (CBR) is a process used for computer processing that tries to mimic the behavior of a human expert in making decisions regarding a subject and learn from the experience of past cases. CBR has demonstrated to be appropriate for working with unstructured domains data or difficult knowledge acquisition situations, such as medical diagnosis, where it is possible to identify diseases such as: cancer diagnosis, epilepsy prediction and appendicitis diagnosis. Some of the trends that may be developed for CBR in the health science are oriented to reduce the number of features in highly dimensional data. An important contribution may be the estimation of probabilities of belonging to each class for new cases. In this paper, in order to adequately represent the database and to avoid the inconveniences caused by the high dimensionality, noise and redundancy, a number of algorithms are used in the preprocessing stage for performing both variable selection and dimension reduction procedures. Also, a comparison of the performance of some representative multi-class classifiers is carried out to identify the most effective one to include within a CBR scheme. Particularly, four classification techniques and two reduction techniques are employed to make a comparative study of multiclass classifiers on CBR.
An arrhythmia is a pathology that consists on altering the heartbeat. Although, the 12-lead electrocardiogram allows evaluation of the electrical behavior from heart to determine certain pathologies, there are some arrhythmias that are difficult to detect with this type of electrocardiography. In this sense, it is necessary the use of the Holter monitor because it facilitates the records of the heart electrical activity for long periods of time, it is usually 24 up to 48 hours. Due to the extension of the records provided by the monitor, it is common to use computational systems to evaluate diagnostic and morphological features of the beats in order to determine if there is any type of abnormality. These computational systems can be based on supervised or unsupervised pattern recognition techniques, however considering that the first option requires a visual inspection about the large number of beats present in a Holter record, it is an arduous task, as well as it involves monetary costs. Consequently, throughout this paper we present the design of a complete system for the identification of arrhythmias in Holter records using unsupervised pattern recognition techniques. The proposed system involves stages of preprocessing of the signal, segmentation and characterization of beats, as well as feature selection and clustering. In this case, the technique k-means is used. These steps are applied within the framework of a segment-based methodology that improves the detection of minority classes. Additionally, initialization criteria are considered, which allow to enhance quality measures, especially sensitivity. As a result, it is determined that using k-means with the max-min initialization and a number of groups equal to 12, it is possible to obtain the best results, with values of: 99.36%, 91.31% and 99.16% for accuracy, sensitivity and specificity, respectively.
Internet of Things (or also known as IoT) is one of the technologies most named today because of the ability it envisages to connect all kinds of devices to the Internet. If to the potentialities of IoT we add another technology of high impact as It is the Artificial Vision we have a wide field of innovative applications, where the processing of images and video in real time allow the visualization of large amounts of data on the Internet. The main applications developed with IoT and Artificial Vision can be implemented in education, medicine, intelligent buildings, surveillance systems of people and vehicles, among others. This type of applications improves the quality of life of users, however, for their development an infrastructure is required that allows the convergence of different protocols and devices, but in a special way that can handle the different phases of the acquisition of images. In this work, a review of the beginnings, concepts, technologies and applications related to the Artificial Vision with the Internet of Things has been carried out to be able to understand in a precise way the impact of its application in daily life.
Es un texto guía para comprender el manejo de diodos y transistores para aplicarlos en la rectificación de voltaje alterno a continuo, y las formas de amplificar la corriente según las diferentes polarizaciones de transistores y sus métodos de conmutación. El uso de diodos de silicio y germanio, transistores BJT, MOSFET y FET con ejemplos y ejercicios para comprender su funcionamiento, siendo este la base de la electrónica moderna.
El presente libro se ha realizado con los objetivos de brindar los conocimientos básicos de programación y funcionamiento de un sistema embebido en plataformas libres. Arduino se ha vuelto una herramienta muy importante para el prototipado de innovaciones tecnológicas, su bajo costo y amplias referencias en el internet. Esto permite disminuir la curva de aprendizaje de los estudiantes en áreas de ingeniería. Se encuentra estructurado de tal forma que cada capítulo sea las bases de aprendizaje para los siguientes contenidos. Cada uno de ellos cuenta con estructuras de programación, ejemplos y ejercicios propuestos para fortalecer el proceso de aprendizaje.
En la actualidad, el desarrollo de dispositivos basados en el seguimiento de mirada (Eye-tracking), ha experimentado una gran acogida en diversos campos, tanto de investigación como de aplicación, muestra de ello es el crecimiento en la realización de estudios investigativos en áreas como la medicina, la educación, el marketing, entre otras. En este documento, se elabora una reseña descriptiva sobre el desarrollo y aplicación que los dispositivos eye tracking han experimentado recientemente, basada en la revisión de artículos de investigación y webs especializadas que tratan el tema.
Este trabajo presenta el dise~no y la implementación de un dispositivo electrónico de bajo costo y portable para apoyo en terapias psicológicas por estrés o ansiedad. El dispositivo se basa en la estimación de la actividad mecánica del corazón mediante técnicas pletismográficas, esto debido a que muchas de las respuestas fisiológicas humanas ante estados psicológicos como el estrés y la ansiedad resultan en variaciones del ritmo cardiaco. El dispositivo propuesto está compuesto por etapas comprendidas entre la adquisición de la se~nal mediante fotopletismografía, la adecuación de la señal mediante filtros analógicos, el procesamiento digital de la señal y estimación del ritmo cardiaco mediante un algoritmo implementado en una tarjeta arduino, y finalmente una interfaz hombre-dispositivo con la que se realiza un proceso de realimentación visual y auditiva. Las pruebas y los resultados experimentales muestran la aplicabilidad del dispositivo manteniendo su bajo costo y portabilidad.
En la actualidad se puede evidenciar un crecimiento exponencial del volumen de datos, dando lugar al área emergente denominada Big Data. Paralelamente a este crecimiento, ha aumentado la demanda de herramientas, técnicas y dispositivos para almacenar, transmitir y procesar datos de alta dimensión. La mayoría de metodologías existentes para procesar datos de alta dimensión producen resultados abstractos y no envuelven al usuario en la elección o sintonización las técnicas de análisis. En este trabajo proponemos una metodología de análisis visual de Big Data con principios de interactividad y controlabilidad de forma que usuarios (incluso aquellos no expertos) puedan seleccionar intuitivamente un método de reducción de dimensión para generar representaciones inteligibles para el ser humano.
El trastorno por déficit de atención/hiperactividad (TDAH) es caracterizado principalmente por distracción, inquietud motora y conductas impulsivas. Su diagnóstico se da típicamente en niños, no obstante, en la mayoría de casos este trastorno puede persistir hasta la edad adulta y puede ocasionar diversos impactos negativos en la dinámica social y emocional de los individuos. Dada su transversalidad, el TDAH en niños ha sido abordado desde diferentes campos de las neurociencias, la psicología y la pedagogía. Algunos especialistas afirman que existe una relación entre los síntomas del TDAH con una disfunción de ciertas áreas del cerebro. Otros especialistas definen al TDAH como un síndrome conductual. Esta dicotomía conceptual ha sido objeto de controversia científica durante la última década. Otro aspecto igualmente controversial en TDAH es su diagnóstico, debido a que incluye un análisis exhaustivo del niño y esto requiere, de la formación de los padres y docentes para identificación de síntomas, además de especialistas con varias horas de dedicación. En la práctica, tener un criterio armónico entre las tres partes no resulta del todo factible. Dentro de este escenario multidisciplinario y controversial, este artículo presenta una disertación reflexiva y crítica a modo de ensayo sobre la definición, el diagnóstico y el tratamiento de TDAH en niños desde una perspectiva psicopedagógica, neuropsicológica y neurofisiológica.
This work presents an improved interactive data visualization interface based on a mixture of the outcomes of dimensionality reduction (DR) methods. Broadly, it works as follows: The user can input the mixture weighting factors through a visual and intuitive interface with a primary-light-colors-based model (Red, Green, and Blue). By design, such a mixture is a weighted sum of the color tone. Additionally, the low-dimensional representation space produced by DR methods are graphically depicted using scatter plots powered via an interactive data-driven visualization. To do so, pairwise similarities are calculated and employed to define the graph to simultaneously be drawn over the scatter plot. Our interface enables the user to interactively combine DR methods by the human perception of color, while providing information about the structure of original data. Then, it makes the selection of a DR scheme more intuitive -even for non-expert users.
Sleep stage classification is a highly addressed issue in polysomnography; It is considered a tedious and time-consuming task if done manually by the specialist; therefore, from the engineering point of view, several methods have been proposed to perform an automatic sleep stage classification. In this paper an unsupervised approach to automatic sleep stage clustering of EEG signals is proposed which uses spectral features related to signal power, coherences, asymmetries, and Wavelet coefficients; the set of features is classified using a clustering algorithm that optimizes a cost function of minimum sum of squares. Accuracy and kappa coefficients are comparable to those of the current literature as well as individual stage classification results. Methods and results are discussed in the light of the current literature, as well as the utility of the groups of features to differentiate the states of sleep. Finally, clustering techniques are recommended for implementation in support systems for sleep stage scoring.
In this work, an efficient non-supervised algorithm for clustering of ECG signals is presented. The method is assessed over a set of records from MIT/BIH arrhythmia database with different types of heartbeats, including normal (N) heartbeats, as well as the arrhythmia heartbeats recommended by the AAMI, usually found in Holter recordings: ventricular extra systoles (VE), left and right branch bundles blocks (LBBB and RBBB) and atrial premature beats (APB). The results are assessed by means the sensitivity and specificity measures, taking advantage of the database labels. Also, unsupervised performance measures are used. Finally, the performance of the algorithm is in average 95%, improving results reported by previous works of the literature.
The analysis of human sit down position is a research area allows for preventing health physical problems in the back. Many works have proposed systems that detect the sitting position, some open issues are still to be dealt with, such as: Cost, computational load, accuracy, portability, and among others. In this work, we present an alternative approach based on an embedded system to acquire the position-related variables and machine learning techniques, namely dimensionality reduction (DR) and classification. Since the information acquired by sensors is high-dimensional and therefore it might not be saved into embedded system memory, for this reason the system has a DR stage based on principal component analysis (PCA) is performed. Subsequently, the posed detection is carried out by the k-nearest neighbors (KNN) classifier between the matrix stored in the system and new data acquired by pressure and distance sensors. Thus, regarding using the whole data set, the computational cost is decreased by 33 % as well as the data reading is reduced by 10 ms. Then, sitting-pose detection task takes 26 ms, and reaches 75% of accuracy in a 4-trial experiment.
Interesa a la ingeniería, el desarrollo biotecnológico de técnicas de tratamiento de agua para la remoción de contaminantes, aprovechando las propiedades de las fibras de Luffa cylindrica (FLc). Así lo exponen los argumentos explicados en varias investigaciones realizadas en todo el mundo en las que se describe un sugestivo escenario de razones válidas para considerar a la fibra del estropajo, como un material industrialmente promisorio y sostenible, apto para la realización de tratamientos de remoción de contaminantes y en la separación de sustancias inmersas en matrices fluidas. También explican la utilización de las fibras como matriz inmovilizadora para sostener comunidades microbianas activas implantadas con fines específicos; incluso, al comprender la arquitectura y las propiedades mecánicas de las FLc, se explora su utilización como agregado en la obtención de materiales compuestos, en la producción de nuevas sustancias y su capacidad para retener humedad. El presente artículo se refiere a la descripción de los procesos de adsorción e inmovilización en los que se ha involucrado a las FLc haciendo una revisión de experiencias investigativas. A la vez se estudian, los razonamientos que han permitido describir las técnicas y que posibilitan el aporte de soluciones al problema de la remoción de contaminantes y del tratamiento de agua.
Currently, information overload has caused human analysis capabilities become insufficient in comparison with the imminent growth of technological capabilities to collect, communicate and store large volumes of information. In hospitals or health entities, millions of data, diagnostic tests, and other information are collected every day. The medicine has used the technological revolution in many ways to facilitate the handling, processing, visualization and analysis of data, a specific task associated with the diagnosis of diseases is the pathology classification, wherein the machine trained in conjunction with an expert can improve and facilitate the diagnosis. Currently there are many disciplines which aim is developed techniques that allow computers to learn, however one of the most explored is machine learning in which classifiers have been developed, they allow the assignment of new objects represented in features into beforehand known categories. As part of artificial intelligence, the classification of pathologies is not a trivial task - there is a great variety of classification techniques. In this connection, this research work performs a comparison of multi-class classification techniques recommended by scientific literature, such as: SVM (Support Vector Machine), KNN (K-nearest neighbors algorithm), ANN (Artificial neural network), PC (Parzen´s Classifier), Random Forest and Adaboost (Adaptative Boosting). The experiments are performed on heart disease (Cleveland), cardiotocography and hypothyroidism databases. As a result, it is demonstrated that Cleveland and hypothyroid databases are challenging for the ANN, PC and Adaboost classifiers because of their scarcity of data, causing that classification procedures reach greater error values. On the contrary, when dealing with databases having proper balance in their number of samples for each class (for instance, cardiotocography), better results are obtained when classification training is done with Adaboost.
By the exponential and vertiginous growth of the volume of data of different types: structured, semi-structured and unstructured from a variety of sources including: the web, social networks, databases, audio / video files, transactional data, sensors, machine-tomachine communication (denominated M2M). The Big-Data-area is intended to address the challenges of information processing. Therefore, the Big Data Analytics (BDA) process of large volumes of data facilitates the discovery of patterns, predictions, fraud, market trends, customer behaviours and preferences and useful information that would not be possible with conventional tools. BDA becomes one of the tools to support business decision-making and competitive advantage in real time or in the shortest possible time in relation its competitors, offering new levels of competitiveness, processes, business models based in data and risk reduction, to conserve, retain and attract a greater number of customers generating an increase in the sources of income of companies. This article is exploratory, descriptive and documentary. A descriptive study of the impact of Big Data Analytics (BDA) in the business field, as well as a brief tour of its tendencies, opportunities, difficulties and challenges. This study aims to contribute to the research community, as well as the staff of the companies and those who are introduced to the knowledge of Big Data Analytics for a better understanding in this field.
The Ecuadorian economy has been based mainly on the exploitation of raw materials and imports of goods and services, this ecosystem not encouraging development as a result in certain periods of time and according to the variability of the international market has caused to this day swings, as a result of changes in the prices of these resources compared to prices of products with higher added value and high technology. One of the main objectives of the Change of the Productive Matrix is to evolve this pattern of primary exporting specialization to a pattern of diversified production highlighting especially the capacities and knowledge of human talent involved in these processes. In order to frame the economic and social development of the population around this change, the incorporation of technology has become one of the main axes to improve and achieve the proposed objectives, being in this way a vital tool to improve production. Of the many technological possibilities that exist, this research highlights, analyzes and evaluates the incorporation of unmanned Aerial Systems or Vehicles commonly known as DRONES to the agricultural sector with special emphasis on the advantages, disadvantages, risks, acceptance levels, components of hardware and software, statistics related to production cost savings and how the current Regulation by the Civil Aviation Directorate in Ecuador contributes to the development and usability of these systems. Finally, the conclusions point to the great possibilities of growth and transformation of agriculture to effective methods of precision where the aid of DRONES and automation can provide a large amount of information such as reports, images, videos, maps and others where human intervention becomes minimal.
Mechatronics engineering is a relatively new specialty, which is shaped by the synergy and utilization of characteristics and the strengths of other specialties that already have a long journey at the professional level. In particular, the training of a mechatronics student involves programming techniques that constitute the brain or intelligent part for the development of any automatic system. Therefore, the programming area is transversal and must be adequately taught to mechatronics students. However, there is evidence that some mechatronics educators have taken the position of what future professionals, there are no software developers, no treatment receive programming from a depth, but on the contrary, just a training in basic concepts. The applied method for the accomplishment of this article is experimental, applied from the experience in the professional practice and as teachers of the subjects of basic programming, advanced and control systems, in the race of engineering in mechatronics in the Technical University of the North, which has allowed to present a reflection on the importance and relevance of the area of computer programming in the teaching of mechatronics. As a conclusion of this work the elements of discussion and the technicians are presented to take a position before the necessity and the pertinence of the teaching of the computer programming for mechatronics in an appropriate way, that is to say, in the context and level required by a professional In engineering, covering in equal parts the synergy of the acquaintances who make up the Engineering career in Mechatronics.
This work proposes two novel alternatives for dealing with the highly important issue of the clustering performance estimation. One of the measures is the cluster coherence aimed to quantifying the normalized ratio of cuts within a graph-partitioning framework, and therefore it uses a graph-driven approach to explore the nature of data regarding the cluster assignment. The another one is the probability-based-performance quantifier, which calculates a probability value for each cluster through relative frequencies. Proposed measures are tested on some clustering representative techniques applied to real and artificial data sets. Experimental results probe the readability and robustness to noisy labels of our measures.
Brain-computer interface (BCI) is a system that provides communication between human beings and machines through an analysis of human brain neural activity. Several studies on BCI systems have been carried out in controlled environments, however, a functional BCI should be able to achieve an adequate performance in real environments. This paper presents a comparative study on alternative classification options to analyze motor imaginary BCI within multi-environment real scenarios based on mixtures of classifiers. The proposed methodology is as follows: The imaginary movement detection is carried out by means of feature extraction and classification, in the first stage; feature set is obtained from wavelet transform, empirical mode decomposition, entropy, variance and rates between minimum and maximum, in the second stage, where several classifier combinations are applied. The system is validated using a database, which was constructed using the Emotiv Epoc+ with 14 channels of electroencephalography (EEG) signals. These were acquired from three subject in 3 different environments with the presence and absence of disturbances. According to the different effects of the disturbances analyzed in the three environments, the performance of the mixture of classifiers presented better results when compared to the individual classifiers, making it possible to provide guidelines for choosing the appropriate classification algorithm to incorporate into a BCI system.
En esta investigación se realiza una propuesta para abrir una planta torrefactora de café especial en el departamento de Nariño-Colombia, teniendo en cuenta todos los actores que hacen parte de la cadena de suministro, para esto se plantea un modelo de red de distribución mediante el uso de la programación matemática respetando una serie de restricciones como capacidades, recursos físicos, flujo de materiales y políticas de la empresa.
El modelo planteado permitirá determinar si la planta torrefactora de café, está en la capacidad de satisfacer la demanda de sus clientes en 4 regiones (Pasto, Cali, EEUU y Europa). Para hacer frente a este hecho, la empresa debe decidir si abrir y construir o no las plantas sobre un conjunto de posibles ubicaciones (La Unión, Buesaco, Albán, Samaniego y Sandona) y para los almacenes (Pasto, Cali, Usa y Europa), además debe garantizar que sus proveedores de materia prima con los que ha hecho un contrato, suministren la materia prima en el momento adecuado con los más altos estándares de calidad según las políticas de la empresa.
Una vez definido el modelo de programación matemática, se valido mediante la ayuda de un solver con el fin de determinar las mejores opciones de optimización del problema respecto a la función objetivo y restricciones, los datos obtenidos permitieron decidir que la mejor opción para abrir/construir la planta es el municipio de Buesaco de acuerdo a lo requerido para el problema de la red de suministro y distribución planteada.
En esta investigación se realiza una propuesta de un modelo de simulación por dinámica de sistemas para abordar el problema de la influencia de los inventarios en las fluctuaciones de los sistemas de producción para el montaje de una planta de torrefacción de café especial ubicada en Nariño-Colombia tras el diseño y gestión del proceso de producción, la programación y la adquisición de materias primas, el control del flujo de materiales y nivel de stocks. Los objetivos por tanto están basados en precio/ganancias como son reducir al mínimo los costos de inventarios, inversión realizada en el inventario y reducir horas extras y tiempo de producción, también se desea maximizar la cantidad de ventas, ya sea monetaria o unidades vendidas.
Para desarrollar el modelo de simulación se siguió los siguientes pasos: 1. CONCEPTUALIZACIÓN: Definición del propósito, identificación de variables (Nivel, flujo, auxiliares), horizonte de tiempo, desarrollo del diagrama causal 2. FORMULACIÓN: Desarrollo del diagrama de flujos (diagrama de Forrester), determinación de las ecuaciones matemáticas del modelo. 3. ANÁLISIS Y EVALUACIÓN: Análisis de sensibilidad, prueba del modelo bajo supuestos.
Los resultados obtenidos con el modelo permitieron evaluar el comportamiento de las variables de nivel, flujo y auxiliares mediante análisis de sensibilidad para cumplir con los objetivos propuestos de la simulación, al final se obtiene el mejor modelo de acuerdo a datos coincidentes con la realidad.
To perform an exploration process over complex structured data within unsupervised settings, the so-called kernel spectral clustering (KSC) is one of the most recommended and appealing approaches, given its versatility and elegant formulation. In this work, we explore the relationship between (KSC) and other well-known approaches, namely normalized cut clustering and kernel k-means. To do so, we first deduce a generic KSC model from a primal-dual formulation based on least-squares support-vector machines (LS-SVM). For experiments, KSC as well as other consider methods are assessed on image segmentation tasks to prove their usability.
La Universidad Técnica del Norte a través de la carrera de ingeniería en Electrónica y Redes de Comunicación, fortaleciendo al proceso de vinculación con la sociedad, crea un proyecto que beneficia a la población de las zonas rurales de la provincia de Imbabura, y que, particularmente, tiene el objetivo de disminuir la brecha digital en colegios, brindado acceso a las tecnologías de información y comunicación. Para poner en acción el proyecto de mejoramiento de las capacidades tecnológicas, se consideró como población de interés a los individuos mayores de 12 años, hombres y mujeres, de diferentes niveles sociales, culturales y étnicos. Metodológicamente, se plantea una planificación de programas de capacitación en base a un diagnóstico de las necesidades, que posteriormente fueron evaluados con el propósito de reducir las deficiencias tecnológicas de las personas y así mejorar la calidad de vida con miras a reducir el analfabetismo digital. Dentro de la ejecución del proyecto, se toma como estrategia los Infocentros Comunitarios y propios laboratorios de computación de las instituciones educativas, los mismos que son punto de referencia para llegar a las comunidades, logrando capacitar en herramientas de ofimática básica, asistencias de redes de comunicación, Internet, mantenimiento de computadoras, y administración de páginas web. Posteriormente a la capacitación, claramente se pudo observar que, en los sectores de acción, satisficieron sus expectativas, adquirieron habilidades y destrezas de temas tecnológicos. En este artículo se presenta los aspectos metodológicos y resultados más importantes del proyecto.
In this work, a comparative study of feature selection methods for supervised and unsupervised inference obtained from classical PCA is presented. We deduce an expression for the cost function of PCA based on the mean square error of data and its orthonormal projection, and then this concept is extended to obtain an expression for general WPCA. Additionally, we study the supervised and unsupervised Q – α algorithm and its relation with PCA. At the end, we present results employing two data sets: A low-dimensional data set to analyze the effects of orthonormal rotation, and a highdimensional data set to assess the classification performance. The feature selection methods were assessed taking into account the number of relevant features, computational cost and classification performance. The classification was carried out using a partitional clustering algorithm.
The computer-assisted analysis of biomedical records has become an essential tool in clinical settings. However, current devices provide a growing amount of data that often exceeds the processing capacity of normal computers. As this amount of information rises, new demands for more efficient data extracting methods appear. This paper addresses the task of data mining in physiological records using a feature selection scheme. An unsupervised method based on relevance analysis is described. This scheme uses a least-squares optimization of the input feature matrix in a single iteration. The output of the algorithm is a feature weighting vector. The performance ofthe method was assessed using a heartbeat clustering test on real ECG records. The quantitative cluster validity measures yielded a correctly classified heartbeat rate of 98.69% (specificity), 85.88% (sensitivity) and 95.04% (general clustering performance), which is even higher than the performance achieved by other similar ECG clustering studies. The number of features was reduced on average from 100 to 18, and the temporal cost was a 43% lower than in previous ECG clustering schemes.
This work describes a new model for interactive data visualization followed from a dimensionality-reduction (DR)-based approach. Particularly, the mixture of the resulting spaces of DR methods is considered, which is carried out by a weighted sum. For the sake of user interaction, corresponding weighting factors are given via an intuitive color-based interface. Also, to depict the DR outcomes while showing information about the input high-dimensional data space, the low-dimensional representations reached by the mixture is conveyed using scatter plots enhanced with an interactive data-driven visualization. In this connection, a constrained dissimilarity approach define the graph to be drawn on the scatter plot.
Dynamic or time-varying data analysis is of great interest in emerging and challenging research on automation and machine learning topics. In particular, motion segmentation is a key stage in the design of dynamic data analysis systems. Despite several studies have addressed this issue, there still does not exist a final solution highly compatible with subsequent clustering/classification tasks. In this work, we propose a motion segmentation compatible with kernel spectral clustering (KSC), here termed KSC-MS, which is based on multiple kernel learning and variable ranking approaches. Proposed KSC-MS is able to automatically segment movements within a dynamic framework while providing robustness to noisy environments.
The biometric is an open research field that requires analysis of new techniques to increase its accuracy. Although there are active biometric systems for subject identification, some of them are considered vulnerable to be fake such as a fingerprint, face or palm-print. Different biometric studies based on physiological signals have been carried out. However, these can be regarded as limited. So, it is important to consider that there is a need to perform an analysis among them and determine the effectivity of each one and proposed new multimodal biometric systems. In this work is presented a comparative study of 40 physiological signals from a multimodal analysis. First, a preprocessing and feature extraction was carried out using Hermite coefficients, discrete wavelet transform, and statistical measures of them. Then, feature selection was applied using two selectors based on Rough Set algorithms, and finally, classifiers and a mixture of five classifiers were used for classification. The more relevant results shown an accuracy of 97.7% from 3 distinct EEG signals, and an accuracy of 100% using 40 different physiological signals (32 EEG, and eight peripheral signals).
The estimation of energy demand is not always straightforward or reliable, as one or several classes may fail in the prediction. In this study, a novel methodology of load forecasting is proposed. Three different configurations of Artificial Neural Networks perform a supervised classification of energy consumption data, each one providing an output vector of unreliable predicted data. Under the clustering method k-means, multiple patterns are identified, and then processed by the Gaussian Mixture Model in order to provide higher relevance to the more accurate predicted samples of data. The accuracy of the prediction is evaluated with the several error rate measurements. Finally, a mixture of the generated forecasts by the methods is performed, showing a lower error rate compared to the inputs predictions, therefore, a more reliable forecast.
This work presents an adaptation and validation of a method for automatic crop row detection from images captured in potato fields (Solanum tuberosum) for initial growth stages based on the micro-ROI concept. The crop row detection is a crucial aspect for autonomous guidance of agricultural vehicles and site-specific treatments application. The images were obtained using a color camera installed in the front of a tractor under perspective projection. There are some issues that can affect the quality of the images and the detection procedure, among them: uncontrolled illumination in outdoor agricultural environments, different plant densities, presence of weeds and gaps in the crop rows. The adapted approach was designed to address these adverse situations and it consists of three linked phases. The main contribution is the ability to detect straight and curved crop rows in potato crops. The performance was quantitatively compared against two existing methods, achieving acceptable results in terms of accuracy and processing time.
The accelerated growth of competition among companies globally is undeniable. The new challenges to generate value, discover trends and preferences of customers demand companies to have technological tools of greater reach and efficiency that allow to be support in the decision making in real time or in the shortest possible time in order to create competitive advantage and stay in the market. Technological tools like Data mining, Big data, among others, aim to obtain new knowledge, patterns, predictions for a better understanding and precision of costumer needs, which would not be possible with the use of conventional tools. Hence teamwork in the area of technology and business requires this knowledge of value for the definition of strategies aimed at capitalizing on this knowledge originated in an internal and external way especially of social networks. The present article is descriptive, exploratory, and documentary, perform an analysis of the technological tools used at the business level and topics of use in the decision making and search of the loyalty of its clients, as well as its value aspects and challenges. Consequently, the results show the favorable and efficient relationship of the application of these technologies in business decisionmaking, therefore, the need arises for a greater number of future research of this type.
Mediante este documento se hace referencia a un proyecto de investigación en el área médica y fisioterapéutica, enmarcada en tratamientos para recuperación de la movilidad funcional de los dedos de la mano que se han generado por traumatismo ocasionados en accidentes de tránsito, de trabajo o problemas neurológicos y congénitos. El diseño de este rehabilitador pasivo involucra utilizar un software adecuado tanto para la simulación del prototipo como para el sistema de control que se tendrá a partir de procesos fisioterapéuticos ya establecidos, estos procesos permitirán la rehabilitación de los dedos de mano. Se hace uso de nuevas tecnologías para la construcción y control del dispositivo tomando en cuenta las condiciones ambientales y hospitalarias, de esta manera la persona afectada pueda volver a desempeñar sus funciones cotidianas mejorando su calidad de vida.
This work presents a comparative study of prototype selection (PS) algorithms. Such a study is done over data-from-sensor acquired by an embedded system. Particularly, five flexometers are used as sensors, which are located inside a glove aimed to read sign language. Measures were taken to quantify the balance between classification performance and reduction training set data (QCR) with k neighbors equal to 3 and 1 to force the classifier (kNN) to the maximum. Two tests were used: (a)the QCR performance and (b) the embedded system decision in real proves. As result the Random Mutation Hill Climbing (RMHC) algorithm is considered the best option to choose in this data type with removed instances at 87% and classification performance at 82% in software tests, also the classifier kNN must be with k=3 to improve the classification performance. In a real situation, with the algorithm implemented. The system makes correct decisions at 81% with 5 persons doing sign language in real time.
The area of research on the detection of falls in the elderly allows to prevent major ailments to a person and not receiving timely medical attention. Although different systems have been proposed for the detection of falls, there are some open problems such as: cost, computational load, precision, portability, among others. This paper presents an alternative approach based on the acquisition of speed variation of the person on the X, Y and Z axes using an accelerometer and machine learning techniques. Since the information acquired by the sensor is very variant, with noise and high volume of data, a prototype selection stage is carried out using confidence intervals and techniques of Leaving-One-Out. Subsequently, automatic detection is performed using the K-nearest neighbors (K-NN) classifier. As a result of fall detection 95% accuracy is achieved in experiments from 5 trials and already used in reality by an older adult, the system has a time of 30 ms for position selection and the detection of drop is maintained in a 92% right
This work shows the use of Big Data and Data Mining techniques on vegetable crops data from a greenhouse by implementing the first version of a software tool, so called GreenFarm-DM. Such a tool is aimed at analyzing the factors that influence the growth of the crops, and determine a predictive model of soil moisture. Within a greenhouse, the variables that affect crop growth are: relative humidity, soil moisture, ambient temperature, and levels of illumination and CO2. These parameters are essential for photosynthesis, i.e. during processes where plants acquire the most nutrients, and therefore, if performing a good control on these parameters, plants might grow healthier and produce better fruits. The process of analysis of such factors in a data mining context requires designing an analysis system and establishing an objective variable to be predicted by the system. In this case, in order to optimize water resource expenditure, soil moisture has been chosen as the target variable. The proposed analysis system is developed in a user interface implemented in Java and NetBeans IDE 8.2, and consists mainly of two stages. One of them is the classification through algorithm C4.5 (chosen for the first trial), which uses a decision tree based on the data entropy, and allows to visualize the results graphically. The second main stage is the prediction, in which, from the classification results obtained in the previous stage, the target variable is predicted from information of a new set of data. In other words, the interface builds a predictive model to determine the behavior of soil moisture.
The accelerated growth of competition among companies globally is undeniable. The new challenges to generate value, discover trends and preferences of customers demand companies to have technological tools of greater reach and efficiency that allow to be support in the decision making in real time or in the shortest possible time in order to create competitive advantage and stay in the market. Technological tools like Data mining, Big data, among others, aim to obtain new knowledge, patterns, predictions for a better understanding and precision of costumer needs, which would not be possible with the use of conventional tools. Hence teamwork in the area of technology and business requires this knowledge of value for the definition of strategies aimed at capitalizing on this knowledge originated in an internal and external way especially of social networks. The present article is descriptive, exploratory, and documentary, perform an analysis of the technological tools used at the business level and topics of use in the decision making and search of the loyalty of its clients, as well as its value aspects and challenges. Consequently, the results show the favorable and efficient relationship of the application of these technologies in business decisionmaking, therefore, the need arises for a greater number of future research of this type.
The use of electronic systems and devices has become widely spread and is reaching several fields as well as indispensable for many daily activities. Such systems and devices (here termed embedded systems) are aiming at improving human beings’ quality of life. To do so, they typically acquire users’ data to adjust themselves to different needs and environments in an adequate fashion. Consequently, they are connected to data networks to share this information and find elements that allow them to make the appropriate decisions. Then, for practical usage, their computational capabilities should be optimized to avoid issues such as: resources saturation (mainly memory and battery). In this line, machine learning offers a wide range of techniques and tools to incorporate “intelligence” into embedded systems, enabling them to make decisions by themselves. This paper reviews different data storage techniques along with machine learning algorithms for embedded systems. Its main focus is on techniques and applications (with special interest in Internet of Things) reported in literature about data analysis criteria to make decisions.
Nasa Yuwe is the language of the Nasa indigenous community in Colombia. It is currently threatened with extinction. In this regard, a range of computer science solutions have been developed to the teaching and revitalization of the language. One of the most suitable approaches is the construction of a Part-Of-Speech Tagging (POST), which encourages the analysis and advanced processing of the language. Nevertheless, for Nasa Yuwe no tagged corpus exists, neither is there a POS Tagger and no related works have been reported. This paper therefore concentrates on building a linguistic corpus tagged for the Nasa Yuwe language and generating the first tagging application for Nasa Yuwe. The main results and findings are 1) the process of building the Nasa Yuwe corpus, 2) the tagsets and tagged sentences, as well as the statistics associated with the corpus, 3) results of two experiments to evaluate several POS Taggers (a Random tagger, three versions of HSTAGger, a tagger based on the harmony search metaheuristic, and three versions of a memetic algorithm GBHS Tagger, based on Global-Best Harmony Search (GBHS), Hill Climbing and an explicit Tabu memory, which obtained the best results in contrast with the other methods considered over the Nasa Yuwe language corpus.
The analysis of physiological signals is widely used for the development of diagnosis support tools in medicine, and it is currently an open research field. The use of multiple signals or physiological measures as a whole has been carried out using data fusion techniques commonly known as multimodal fusion, which has demonstrated its ability to improve the accuracy of diagnostic care systems. This paper presents a review of state of the art, putting in relief the main techniques, challenges, gaps, advantages, disadvantages, and practical considerations of data fusion applied to the analysis of physiological signals oriented to diagnosis decision support. Also, physiological signals data fusion architecture oriented to diagnosis is proposed.
Haptic textures are alterations of any surface that are perceived and identified using the sense of touch, and such perception affects individuals. Therefore, it has high interest in different applications such as multimedia, medicine, marketing, systems based on human-computer interface among others. Some studies have been carried out using electroencephalographic signals; nevertheless, this can be considered few. Therefore this is an open research field. In this study, an analysis of tactile stimuli and emotion effects was performed from EEG signals to identify pleasantness and unpleasantness sensations using classifier systems. The EEG signals were acquired using Emotiv Epoc+ of 14 channels following a protocol for presenting ten different tactile stimuli two times. Besides, three surveys (Becks depression, emotion test, and tactile stimuli pleasant level) were applied to three volunteers for establishing their emotional state, depression, anxiety and the pleasantness level to characterize each subject. Then, the results of the surveys were computed and the signals preprocessed. Besides, the registers were labeled as pleasant and unpleasant. Feature extraction was applied from Short Time Fourier Transform and discrete wavelet transform calculated to each sub-bands (δ, θ, α, β, and γ) of EEG signals. Then, Rough Set algorithm was applied to identify the most relevant features. Also, this technique was employed to establish relations among stimuli and emotional states. Finally, five classifiers based on the support vector machine were tested using 10-fold cross-validation achieving results upper to 99% of accuracy. Also, dependences among emotions and pleasant and unpleasant tactile stimuli were identified.
The analysis of electromyographic (EMG) signals enables the development of important technologies for industry and medical environments, due mainly to the design of EMG-based human-computer interfaces. There exists a wide range of applications encompassing: Wireless-computer controlling, rehabilitation, wheelchair guiding, and among others. The semantic interpretation of EMG analysis is typically conducted by machine learning algorithms, and mainly involves stages for signal characterization and classification. This work presents a methodology for comparing a set of state-of-the-art approaches of EMG signal characterization and classification within a movement identification framework. We compare the performance of three classifiers (KNN, Parzen-density-based classifier and ANN) using spectral (Wavelets) and time-domain-based (statistical and morphological descriptors) features. Also, a methodology for movement selection is proposed. Results are comparable with those reported in literature, reaching classification performance of (90.89 ± 1.12)% (KNN), (93.92 ± 0.34)% (ANN) and 91.09 ± 0.93 (Parzen-density-based classifier) with 12 movements.
In recent times, an undeniable fact is that the amount of data available has increased dramatically due mainly to the advance of new technologies allowing for storage and communication of enormous volumes of information. In consequence, there is an important need for finding the relevant information within the raw data through the application of novel data visualization techniques that permit the correct manipulation of data. This issue has motivated the development of graphic forms for visually representing and analyzing high-dimensional data. Particularly, in this work, we propose a graphical approach, which, allows the combination of dimensionality reduction (DR) methods using an angle-based model, making the data visualization more intelligible. Such approach is designed for a readily use, so that the input parameters are interactively given by the user within a user-friendly environment. The proposed approach enables users (even those being non-experts) to intuitively select a particular DR method or perform a mixture of methods. The experimental results prove that the interactive manipulation enabled by the here-proposed model-due to its ability of displaying a variety of embedded spaces-makes the task of selecting a embedded space simpler and more adequately fitted for a specific need.
Today, human-computer interfaces are increasingly more often used and become necessary for human daily activities. Among some remarkable applications, we find: Wireless-computer controlling through hand movement, wheelchair directing/guiding with finger motions, and rehabilitation. Such applications are possible from the analysis of electromyographic (EMG) signals. Despite some research works have addressed this issue, the movement classification through EMG signals is still an open challenging issue to the scientific community -especially, because the controller performance depends not only on classifier but other aspects, namely: used features, movements to be classified, the considered feature-selection methods, and collected data. In this work, we propose an exploratory work on the characterization and classification techniques to identifying movements through EMG signals. We compare the performance of three classifiers (KNN, Parzen-density-based classifier and ANN) using spectral (Wavelets) and time-domain-based (statistical and morphological descriptors) features. Also, a methodology for movement selection is proposed. Results are comparable with those reported in literature, reaching classification errors of 5.18% (KNN), 14.7407% (ANN) and 5.17% (Parzen-density-based classifier)..
This work presents a comparative analysis between the linear combination of em-bedded spaces resulting from two approaches: (1) The application of dimensional reduction methods (DR) in their standard implementations, and (2) Their corresponding kernel-based approximations. Namely, considered DR methods are: CMDS (Classical Multi- Dimensional Scaling), LE (Laplacian Eigenmaps) and LLE (Locally Linear Embedding). This study aims at determining -through objective criteria- what approach obtains the best performance of DR task for data visualization. The experimental validation was performed using four databases from the UC Irvine Machine Learning Repository. The quality of the obtained embedded spaces is evaluated regarding the RNX(K) criterion. The RNX(K) allows for evaluating the area under the curve, which indicates the performance of the technique in a global or local topology. Additionally, we measure the computational cost for every comparing experiment. A main contribution of this work is the provided discussion on the selection of an interactivity model when mixturing DR methods, which is a crucial aspect for information visualization purposes.
Odor identification refers to the capability of the olfactory sense for discerning odors. The interest in this sense has grown over multiple fields and applications such as multimedia, virtual reality, marketing, among others. Therefore, objective identification of pleasant and unpleasant odors is an open research field. Some studies have been carried out based on electroencephalographic signals (EEG). Nevertheless, these can be considered insufficient due to the levels of accuracy achieved so far. The main objective of this study was to investigate the capability of the classifiers systems for identification pleasant and unpleasant odors from EEG signals. The methodology applied was carried out in three stages. First, an odor database was collected using the signals recorded with an Emotiv Epoc+ with 14 channels of electroencephalography (EEG) and using a survey for establishing the emotion levels based on valence and arousal considering that the odor induces emotions. The registers were acquired from three subjects, each was subjected to 10 different odor stimuli two times. The second stage was the feature extraction which was carried out on 5 sub-bands δ, θ, α, β, γ of EEG signals using discrete wavelet transform, statistical measures, and other measures such as area, energy, and entropy. Then, feature selection was applied based on Rough Set algorithms. Finally, in the third stage was applied a Support vector machine (SVM) classifier, which was tested with five different kernels. The performance of classifiers was compared using k-fold cross-validation. The best result of 99.9% was achieved using the linear kernel. The more relevant features were obtained from sub-bands ββ and αα . Finally, relations among emotion, EEG, and odors were demonstrated.
Computer-aided diagnosis (CAD) systems have allowed to enhance the performance of conventional, medical diagnosis procedures in different scenarios. Particularly, in the context of voice pathology detection, the use of machine learning algorithms has proved to be a promising and suitable alternative. This work proposes the implementation of two well known classification algorithms, namely artificial neural networks (ANN) and support vector machines (SVM), optimized by particle swarm optimization (PSO) algorithm, aimed at classifying voice signals between healthy and pathologic ones. Three different configurations of the Saarbrucken voice database (SVD) are used. The effect of using balanced and unbalanced versions of this dataset is proved as well as the usefulness of the considered optimization algorithm to improve the final performance outcomes. Also, proposed approach is comparable with state-of-the-art methods.
This research work focuses on the study of different models of solution reflected in the literature, which treat the optimization of the routing of vehicles by nodes and the optimal route for the university transport service. With the recent expansion of the facilities of a university institution, the allocation of the routes for the transport of its students, became more complex. As a result, geographic information systems (GIS) tools and operations research methodologies are applied, such as graph theory and vehicular routing problems, to facilitate mobilization and improve the students transport service, as well as optimizing the transfer time and utilization of the available transport units. An optimal route management procedure has been implemented to maximize the level of service of student transport using the K-means clustering algorithm and the method of node contraction hierarchies, with low cost due to the use of free software.
Thermoregulation refers to the physiological processes that maintain stable the body temperatures. Infrared thermography is a non-invasive technique useful for visualizing these temperatures. Previous works suggest it is important to analyze thermoregulation in peripheral regions, such as the fingertips, because some disabling pathologies affect particularly the thermoregulation of these regions. This work proposes an algorithm for fingertip segmentation in thermal images of the hand. By using a supervised index, the results are compared against segmentations provided by humans. The results are outstanding even when the analyzed images are highly resized.
This work presents an adaptation and validation of a method for automatic crop row detection from images captured in potato fields (Solanum tuberosum) for initial growth stages based on the micro-ROI concept. The crop row detection is a crucial aspect for autonomous guidance of agricultural vehicles and site-specific treatments application. The images were obtained using a color camera installed in the front of a tractor under perspective projection. There are some issues that can affect the quality of the images and the detection procedure, among them: uncontrolled illumination in outdoor agricultural environments, different plant densities, presence of weeds and gaps in the crop rows. The adapted approach was designed to address these adverse situations and it consists of three linked phases. The main contribution is the ability to detect straight and curved crop rows in potato crops. The performance was quantitatively compared against two existing methods, achieving acceptable results in terms of accuracy and processing time.
This paper proposes an approach for modeling cardiac pulses from electrocardiographic signals (ECG). A modified van der Pol oscillator model (mvP) is analyzed, which, under a proper configuration, is capable of describing action potentials, and, therefore, it can be adapted for modeling a normal cardiac pulse. Adequate parameters of the mvP system response are estimated using non-linear dynamics methods, like dynamic time warping (DTW). In order to represent an adaptive response for each individual heartbeat, a parameter tuning optimization method is applied which is based on a genetic algorithm that generates responses that morphologically resemble real ECG. This feature is particularly relevant since heartbeats have intrinsically strong variability in terms of both shape and length. Experiments are performed over real ECG from MIT-BIH arrhythmias database. The application of the optimization process shows that the mvP oscillator can be used properly to model the ideal cardiac rate pulse.
This work explores novel alternatives to conventional linear homotopy to enhance the quality of resulting transitions from object deformation applications. Studied/introduced approaches extend the linear mapping to other representations that provides smooth transitions when deforming objects while homotopy conditions are fulfilled. Such homotopy approaches are based on transcendental functions (TFH) in both simple and parametric versions. As well, we propose a variant of an existing quality indicator based on the ratio between the coefficients curve of resultant homotopy and that of a less-realistic, reference homotopy. Experimental results depict the effect of proposed TFH approaches regarding its usability and benefit for interpolating images formed by homotopic objects with smooth changes.
Case-Based Reasoning Systems (CBR) are in constant evolution, as a result, this article proposes improving the retrieve and adaption stages through a different approach. A series of experiments were made, divided in three sections: a proper pre-processing technique, a cascade classification, and a probability estimation procedure. Every stage offers an improvement, a better data representation, a more efficient classification, and a more precise probability estimation provided by a Support Vector Machine (SVM) estimator regarding more common approaches. Concluding, more complex techniques for classification and probability estimation are possible, improving CBR systems performance due to lower classification error in general cases.
The evaluation of the data/information fusion systems does not have standard quality criteria making the reuse and optimization of these systems a complex task. In this work, we propose a complete low data fusion (DF) framework based on the Joint Director of Laboratories (JDL) model, which considers contextual information alongside information quality (IQ) and performance evaluation system to optimize the DF process according to the user requirements. A set of IQ criteria was proposed by level. The model was tested with a brain-computer interface (BCI) system multi-environment to prove its functionality. The first level makes the selection and preprocessing of electroencephalographic signals. In level one feature extraction is carried out using discrete wavelet transform (DWT), nonlinear and linear statistical measures, and Fuzzy Rough Set – FRS algorithm for selecting the relevant features; finally, in the same level a classification process was conducted using support vector machine – SVM. A Fuzzy Inference system is used for controlling different processes based on the results given by an IQ evaluation system, which applies quality measures that can be weighted by the users of the system according to their requirements. Besides, the system is optimized based on the results given by the cuckoo search algorithm, which uses the IQ traceability for maximizing the IQ criteria according to user requirements. The test was carried out with different type and levels of noise applied to the signals. The results showed the capability and functionality of the model.
Normalized-cut clustering (NCC) is a benchmark graph-based approach for unsupervised data analysis. Since its traditional formulation is a quadratic form subject to orthogonality conditions, it is often solved within an eigenvector-based framework. Nonetheless, in some cases the calculation of eigenvectors is prohibitive or unfeasible due to the involved computational cost – for instance, when dealing with high dimensional data. In this work, we present an overview of recent developments on approaches to solve the NCC problem with no requiring the calculation of eigenvectors. Particularly, heuristic-search and quadratic-formulation-based approaches are studied. Such approaches are elegantly deduced and explained, as well as simple ways to implement them are provided..
Dimensionality reduction (DR) methods are able to produce low-dimensional representations of an input data sets which may become intelligible for human perception. Nonetheless, most existing DR approaches lack the ability to naturally provide the users with the faculty of controlability and interactivity. In this connection, data visualization (DataVis) results in an ideal complement. This work presents an integration of DR and DataVis through a new approach for data visualization based on a mixture of DR resultant representations while using visualization principle. Particularly, the mixture is done through a weighted sum, whose weighting factors are defined by the user through a novel interface. The interface’s concept relies on the combination of the color-based and geometrical perception in a circular framework so that the users may have a at hand several indicators (shape, color, surface size) to make a decision on a specific data representation. Besides, pairwise similarities are plotted as a non-weighted graph to include a graphic notion of the structure of input data. Therefore, the proposed visualization approach enables the user to interactively combine DR methods, while providing information about the structure of original data, making then the selection of a DR scheme more intuitive.
This work is a Scientific Track paper corresponding to the area of Intelligent Systems. This paper presents a facial recognition approach based on the Eigenfaces method as well as Principal Component Analysis (PCA) as algorithm of processing and cleaning images, respectively. The classification was performed by using the Euclidean distance between the facial characters stored in a database and new images captured in an interface with similarly coded developed in MatLab. As main results, we obtained: (i) 68.9% of classification accuracy when using different components of stored faces, (ii) 91.43% of classification performance when storing 3 components for each face and evaluating more users for training model in seven controlled experiments.
Dimensionality reduction (DR) is a methodology used in many fields linked to data processing, and may represent a preprocessing stage or be an essential element for the representation and classification of data. The main objective of DR is to obtain a new representation of the original data in a space of smaller dimension, such that more refined information is produced, as well as the time of the subsequent processing is decreased and/or visual representations more intelligible for human beings are generated. The spectral DR methods involve the calculation of an eigenvalue and eigenvector decomposition, which is usually high-computational-cost demanding, and, therefore, the task of obtaining a more dynamic and interactive user-machine integration is difficult. Therefore, for the design of an interactive IV system based on DR spectral methods, it is necessary to propose a strategy to reduce the computational cost required in the calculation of eigenvectors and eigenvalues. For this purpose, it is proposed to use locally linear submatrices and spectral embedding. This allows integrating natural intelligence with computational intelligence for the representation of data interactively, dynamically and at low computational cost. Additionally, an interactive model is proposed that allows the user to dynamically visualize the data through a weighted mixture.
The process of distinguishing among human beings through the inspection of acquired data from physical or behavioral traits is known as biometric identification. Mostly, fingerprint- and iris-based biometric techniques are used. Nowadays, since such techniques are highly susceptible to be counterfeited, new biometric alternatives are explored mainly based on physiological signals and behavioral traits -which are useful not only for biometric identification purposes, but may also play a role as a vital signal indicator. In this connection, the electrocardiographic (ECG) signals have shown to be a suitable approach. Nonetheless, their informative components (morphology, rhythm, polarization, and among others) can be affected by the presence of a cardiac pathology. Even more, some other cardiac diseases cannot directly be detected by the ECG signal inspection but still have an effect on their waveform, that is the case of cardiac murmurs. Therefore, for biometric purposes, such signals should be analyzed submitted to the effects of pathologies. This paper presents a exploratory study aimed at assessing the influence of the presence of a pathology when analyzing ECG signals for implementing a biometric system. For experiments, a data base holding 20 healthy subjects and 20 pathological subjects (diagnosed with different types of cardiac murmurs) are considered. The proposed signal analysis consists of preprocessing, characterization (using wavelet features), feature selection and classification (five classifiers as well as a mixture of them are tested). As a result, through the performed comparison of the classification rates when testing pathological and normal ECG signals, the cardiac murmurs’ undesired effect on the identification mechanism performance is clearly unveiled.
The design of the industrial facilities distribution is one of the most important decisions to be made, as it will condition the operation thereof. The concept of industrial installation as it is known today has evolved to the point that it integrates automation and information systems. Indeed, such evolution has given rise to the so-called intelligent factory. At present, in order to produce customized mass products according to customers' requirements, it is become an important issue the distribution of facilities with the generation of successful layout designs, based on the flexibility, modularity and easy configuration of production systems.This paper proposes a methodology to solve the problem of plant distribution design and redesign based upon a novel modular approach within an industry 4.0context. Proposed methodology is an adaptation of the "SLP" Methodology (Systematic Layout Planning-Simulation) so-called SLP Modulary 4.0 (systematic planning of the Layout based on a modular vision under a context of Industry 4.0); this methodology incorporates in its structure an integrated design system (IDS) into its structure, which allows collaborative work with different CAD design and simulation tools. For the validation of the proposed methodology, a case study of a coffee processing plant is considered. The distribution design results obtained from the case study prove the benefit and usefulness of the proposed methodology.
This work presents an intelligent system aimed at detecting a person’s posture when sit a wheelchair. The main use of the proposed system is to warn an improper posture to preventing major health issues. A network of sensors is used to collect data that are analyzed through a scheme involving the following stages: Selection of prototypes using Condensed Nearest Neighborhood rule (CNN), data balancing with the Kennard- Stone (KS) algorithm, and reduction of dimensionality through Principal Component Analysis (PCA). In doing so, acquired data can be both stored and processed into a micro controller. Finally, to carry out the posture classification over balanced, pre-processed data, the K-Nearest Neighbors (k-NN) algorithm is used. It turns to be an intelligent system reaching a good tradeoff between the necessary amount of data and performance is accomplished. As a remarkable result, the amount of required data for training is significantly reduced while an admissible classification performance is achieved being a suitable tradegiven the device conditions.
Environment monitoring is so important because it is based on the first right of people, life and health. For this reason, this system monitoring air quality with different sensor nodes in the Ibarra that evaluate the parameters of CO2, NOx, UV Light, Temperature and Humidity. The data analysis through machine learning algorithms allow the system to classify autonomously if a certain geographical location is exceeding the established emission limits of gases. As a result, the k-Nearest Neighbor algorithm presented a great classification performance when selecting the most contaminated sectors.
Nasa Yuwe is the language of the Nasa indigenous community in Colombia. It is currently threatened with extinction. In this regard, a range of computer science solutions have been developed to the teaching and revitalization of the language. One of the most suitable approaches is the construction of a Part-Of-Speech Tagging (POST), which encourages the analysis and advanced processing of the language. Nevertheless, for Nasa Yuwe no tagged corpus exists, neither is there a POS Tagger and no related works have been reported. This paper therefore concentrates on building a linguistic corpus tagged for the Nasa Yuwe language and generating the first tagging application for Nasa Yuwe. The main results and findings are 1) the process of building the Nasa Yuwe corpus, 2) the tagsets and tagged sentences, as well as the statistics associated with the corpus, 3) results of two experiments to evaluate several POS Taggers (a Random tagger, three versions of HSTAGger, a tagger based on the harmony search metaheuristic, and three versions of a memetic algorithm GBHS Tagger, based on Global-Best Harmony Search (GBHS), Hill Climbing and an explicit Tabu memory, which obtained the best results in contrast with the other methods considered over the Nasa Yuwe language corpus.