Hari's Random Thoughts by Hariharan Ramamurthy: నాట్ బాడ్ గూగుల్ ట్రాన్సలేట్ !Not bad google translate.క్లస్టరింగ్ మరియు వర్గీకరణ/ అత్యంత సాధారణ డేటా మైనింగ్ పనులు

Tuesday, November 27, 2018

నాట్ బాడ్ గూగుల్ ట్రాన్సలేట్ !Not bad google translate.క్లస్టరింగ్ మరియు వర్గీకరణ/ అత్యంత సాధారణ డేటా మైనింగ్ పనులు

Clustering and Classification are two of the most common data mining tasks, used frequently for data categorization and analysis in both industry and academia. Clustering is the process of organizing unlabeled objects into groups of which members are similar in some way. Clustering is a kind of unsupervised learning algorithm. It does not use category labels when grouping objects. In Semi-Supervised clustering, some prior knowledge is available either in the form of labeled data or pair-wise constraints on some of the objects. Classification is a kind of supervised learning algorithm. It is a procedure to assign class labels. A classifier is constructed from the labeled training data using certain classification algorithm, it then will be used to predict the class label of the test data.
In this dissertation, the results of a comprehensive comparative study of three kinds of clustering algorithms including Co-Clustering, Consensus-based Clustering and Semi-supervised Clustering is presented. Through experiments using artificial datasets with different data substructures and UCI data sets, the performance of the three kinds of clustering algorithms was compared and analyzed. A method was proposed to combine a Co-Clustering algorithm and a Semi-supervised Clustering algorithm. A comprehensive comparative study was conducted on three kinds of classification algorithms including Logistic Regression Classifier, Support Vector Machine and Decision Tree. Experiments were carried out using different artificial datasets and UCI data sets to analyze and compare their classification performance. A method using controlled False Discovery Rate was proposed in Logistic Regression Classifier to select important features. A detailed proof was developed to show that controlling False Discovery Rate can be achieved by controlling the related p-value. Experiments were also conducted to compare the classification performance using the proposed feature selection algorithm

Google machine translation

క్లస్టరింగ్ మరియు వర్గీకరణ అత్యంత సాధారణ డేటా మైనింగ్ పనులు రెండు, పరిశ్రమ మరియు అకాడెమీక్ /విద్యాసంస్థలలో, డేటా వర్గీకరణ మరియు విశ్లేషణ కోసం తరచుగా ఉపయోగిస్తారు. క్లస్టరింగ్ అనేది వర్గీకరించని వస్తువులను సమూహంగా నిర్వహించడం అనే ప్రక్రియ. క్లస్టరింగ్ ఒక రకమైన పర్యవేక్షణా రహిత అభ్యాస అల్గోరిథం. వస్తువులను సమూహంగా ఉన్నప్పుడు ఇది వర్గం లేబుల్లను ఉపయోగించదు. సెమీ-పర్యవేక్షించబడిన క్లస్టరింగ్లో, కొంతమంది ముందస్తు పరిజ్ఞానం లేబుల్ డేటా లేదా కొన్ని వస్తువులపై జంట-వారీగా అడ్డంకులను రూపంలో అందుబాటులో ఉంటుంది. వర్గీకరణ అనేది ఒక రకమైన పర్యవేక్షక అభ్యాస అల్గోరిథం. ఇది క్లాస్ లేబుల్స్ను కేటాయించే ప్రక్రియ. నిర్దిష్ట వర్గీకరణ అల్గారిథమ్ని ఉపయోగించి లేబుల్ శిక్షణ డేటా నుండి ఒక వర్గీకరణను నిర్మించారు, అప్పుడు పరీక్ష డేటా యొక్క తరగతి లేబుల్ని అంచనా వేయడానికి ఉపయోగించబడుతుంది.
ఈ డిసర్టేషన్లో, సహ-క్లస్టరింగ్, ఏకాభిప్రాయం ఆధారిత క్లస్టరింగ్ మరియు సెమీ పర్యవేక్షణ క్లస్టరింగ్తో సహా మూడు రకాల క్లస్టరింగ్ అల్గోరిథంలు సమగ్ర తులనాత్మక అధ్యయనం యొక్క ఫలితాలు చూపించబడ్డాయి. విభిన్న డేటా సబ్స్ట్రక్చర్స్ మరియు UCI డేటా సమితులతో కృత్రిమ డేటాసెట్లను ఉపయోగించి ప్రయోగాలు ద్వారా, క్లస్టరింగ్ అల్గారిథమ్ల యొక్క మూడు రకాల పనితీరు పోల్చడం మరియు విశ్లేషించడం జరిగింది. ఒక కో-క్లస్టరింగ్ అల్గోరిథం మరియు సెమీ-పర్యవేక్షించబడిన క్లస్టరింగ్ అల్గోరిథంలను కలపడానికి ఒక పద్ధతి ప్రతిపాదించబడింది. లాజిస్టిక్ రిగ్రెషన్ క్లాస్సిఫైయర్, సపోర్ట్ వెక్టర్ మెషిన్ మరియు డెసిషన్ ట్రీ వంటి మూడు రకాల వర్గీకరణ అల్గోరిథంలపై సమగ్ర తులనాత్మక అధ్యయనం నిర్వహించబడింది. వారి వర్గీకరణ పనితీరును విశ్లేషించడానికి మరియు సరిపోల్చడానికి వివిధ కృత్రిమ డేటాసెట్లను మరియు UCI డేటా సమితులను ఉపయోగించి ప్రయోగాలు నిర్వహించబడ్డాయి. ముఖ్యమైన లక్షణాలను ఎంచుకోవడానికి లాజిస్టిక్ రిగ్రెషన్ క్లాస్సిఫైర్లో నియంత్రిత ఫాల్స్ డిస్కవరీ రేట్ను ఉపయోగించి ఒక పద్ధతి ప్రతిపాదించబడింది. సంబంధిత p- విలువను నియంత్రించడం ద్వారా ఫాల్స్ డిస్కవరీ రేట్ను నియంత్రించడం సాధించగలదని ఒక వివరణాత్మక సాక్ష్యం అభివృద్ధి చేయబడింది. ప్రతిపాదిత లక్షణాల ఎంపిక అల్గోరిథం ఉపయోగించి వర్గీకరణ పనితీరును పోల్చడానికి కూడా ప్రయోగాలు నిర్వహించబడ్డాయి

After my editing

క్లస్టరింగ్ మరియు వర్గీకరణ/ అత్యంత సాధారణ డేటా మైనింగ్ పనులు రెండు, పరిశ్రమ మరియు అకాడెమీ రెండు డేటా వర్గీకరణ మరియు విశ్లేషణ కోసం తరచుగా ఉపయోగిస్తారు. క్లస్టరింగ్ అనేది వర్గీకరించని వస్తువులను సమూహంగా నిర్వహించడం అనే ప్రక్రియ. క్లస్టరింగ్ ఒక రకమైన పర్యవేక్షణా రహిత అభ్యాస అల్గోరిథం. వస్తువులను సమూహంగా ఉన్నప్పుడు ఇది వర్గం లేబుల్లను ఉపయోగించదు. సెమీ-పర్యవేక్షించబడిన క్లస్టరింగ్లో, కొంతమంది ముందస్తు పరిజ్ఞానం లేబుల్ డేటా లేదా కొన్ని వస్తువులపై జంట-వారీగా అడ్డగించబడిన/పరిమితులు రూపంలో అందుబాటులో ఉంటుంది. వర్గీకరణ అనేది ఒక రకమైన పర్యవేక్షక అభ్యాస అల్గోరిథం. ఇది క్లాస్ లేబుల్స్ను కేటాయించే ప్రక్రియ. నిర్దిష్ట వర్గీకరణ అల్గారిథమ్ని ఉపయోగించి లేబుల్ శిక్షణ/ట్రైనింగ్ డేటా నుండి ఒక వర్గీకరణను నిర్మించారు, అప్పుడు పరీక్ష చేయవలసిన డేటా యొక్క లేబుళ్ల తరగతిని అంచనా వేయడానికి ఉపయోగించబడుతుంది.

ఈ డిసర్టేషన్లో, సహ-క్లస్టరింగ్, ఏకాభిప్రాయ ఆధారిత క్లస్టరింగ్ మరియు సెమీ పర్యవేక్షణ క్లస్టరింగ్తో సహా మూడు రకాల క్లస్టరింగ్ అల్గోరిథంలు , అధ్యయనం యొక్క ఫలితాలలో సమగ్ర తులనాత్మకంగా ఉన్నట్టు చూపించబడ్డాయి. విభిన్న డేటా సబ్ 'స్ట్రక్చర్స్ మరియు UCI డేటా సమితులతో కృత్రిమ డేటాసెట్లను ఉపయోగించి చేసిన ప్రయోగాలు ద్వారా, క్లస్టరింగ్ అల్గారిథమ్ల యొక్క మూడు రకాల పనితీరు పోల్చడం మరియు విశ్లేషించడం జరిగింది. ఒక కో-క్లస్టరింగ్ అల్గోరిథం మరియు సెమీ-పర్యవేక్షించబడిన క్లస్టరింగ్ అల్గోరిథంలను కలపడానికి ఒక పద్ధతి ప్రతిపాదించబడింది. లాజిస్టిక్ రిగ్రెషన్ క్లాస్సిఫైయర్, సపోర్ట్ వెక్టర్ మెషిన్ మరియు డెసిషన్ ట్రీ వంటి మూడు రకాల వర్గీకరణ అల్గోరిథంలపై సమగ్ర తులనాత్మక అధ్యయనం నిర్వహించబడింది. వారి వర్గీకరణ పనితీరును విశ్లేషించడానికి మరియు సరిపోల్చడానికి వివిధ కృత్రిమ డేటాసెట్లను మరియు UCI డేటా సమితులను ఉపయోగించి ప్రయోగాలు నిర్వహించబడ్డాయి. ముఖ్యమైన లక్షణాలను ఎంచుకోవడానికి లాజిస్టిక్ రిగ్రెషన్ క్లాస్సిఫైర్లో నియంత్రిత ఫాల్స్ డిస్కవరీ రేట్ను ఉపయోగించి ఒక పద్ధతి ప్రతిపాదించబడింది. సంబంధిత p- విలువను నియంత్రించడం ద్వారా ఫాల్స్ డిస్కవరీ రేట్ను నియంత్రించడం సాధించగలగడానికి , ఒక వివరణాత్మక సాక్ష్యం అభివృద్ధి చేయబడింది. ప్రతిపాదిత లక్షణాల ఎంపిక అల్గోరిథం ఉపయోగించి వర్గీకరణ పనితీరును పోల్చడానికి కూడా ప్రయోగాలు నిర్వహించబడ్డాయి.

గూగుల్ ఆటోమేటిక్ ట్రాన్స్ లేషన్

Clustering

Classification are

two of the most common

data mining tasks,

used frequently for

data categorization

and

analysis

in both

industry and academia.

the process of

organizing

unlabeled objects

into groups of

which members

are similar

in some way.

a kind of

unsupervised learning algorithm

. It does not use

category labels

when grouping objects.

In Semi-Supervised clustering

, some prior knowledge

is available

either in the form of

labeled data

pair-wise constraints

on some of the objects.

supervised learning algorithm

It is a procedure

to assign

class labels.

A classifier

is constructed

from the labeled training data

using certain

classification algorithm

, it

then will be used to

predict

the class label

of the test data.

In this dissertation,

the results of a

comprehensive comparative study

of three kinds of

clustering algorithms

including

Co-Clustering,

Consensus-based Clustering

Semi-supervised Clustering

is presented.

Through experiments

using artificial datasets

different data substructures

UCI data sets,

the performance of the

three kinds of clustering algorithms

was compared and analyzed

. A method was proposed

to combine a Co-Clustering algorithm and a Semi-supervised Clustering algorithm.

A comprehensive comparative study

was conducted

on three kinds of classification algorithms

including

Logistic Regression Classifier, Support Vector Machine and Decision Tree.

Experiments were carried out

using different

artificial datasets and UCI data sets

to analyze and compare

their classification performance.

A method using

controlled False Discovery Rate

was proposed

in Logistic Regression Classifier

to select important features.

A detailed proof was developed

to show that

controlling False Discovery Rate

can be achieved by

controlling the

related p-value.

Experiments were also conducted

to compare the

classification performance

using the

proposed

feature selection algorithm

క్లస్టరింగ్

వర్గీకరణ

అత్యంత సాధారణ రెండు

డేటా మైనింగ్ పనులు,

తరచూ ఉపయోగిస్తారు

డేటా వర్గీకరణ

మరియు

విశ్లేషణ

రెండింటిలో

పరిశ్రమలు మరియు విద్యాసంస్థలలో

ప్రక్రియ

ఆర్గనైజింగ్

లేబుల్చేయని వస్తువులు

సమూహాలుగా/సమూహాలలో

ఇది/ఏ సభ్యులు

ఇలాంటివి ఒకేరకమైనవి

కొన్ని విధంగా./ఒక విధంగా/కొన్ని విదాలుగా

ఒక రకమైన

పర్యవేక్షించని అభ్యాస అల్గోరిథం

. ఇది ఉపయోగించదు

వర్గం లేబుల్స్

వస్తువులను సమూహాలుగా చేస్తున్నప్పుడు .

సెమీ పర్యవేక్షణలో క్లస్టరింగ్లో

, కొన్ని/కొంత ముందు/ముందస్తు జ్ఞానం/కొంత ముందు/ముందస్తు పరిచయం

అందుబాటులో ఉంది

రూపంలో గాని

లేబుల్ డేటా

జంట-జ్ఞాన పరిమితులు

కొన్ని వస్తువులలో.

పర్యవేక్షణలో నేర్చుకోనే( వడం) అల్గోరిథం

ఇది ఒక విధానం

కేటాయించుటకు

తరగతి లేబుల్స్.

ఒక వర్గీకరణ ణా పరికరం

నిర్మించబడింది

లేబుల్ శిక్షణ డేటా నుండి

కొన్నిటిని ఉపయోగించి

వర్గీకరణ అల్గోరిథం

, ఇది

అప్పుడు ఉపయోగించబడుతుంది

అంచనా/సోది చెప్పడానికి ;-)

తరగతి లేబుల్

పరీక్ష డేటా యొక్క.

ఈ సిద్ధాంత సమీక్షా వ్యాసంలో

ఒక ఫలితాలు

సమగ్ర తులనాత్మక అధ్యయనం

యొక్క మూడు రకాల యొక్క

క్లస్టరింగ్ అల్గోరిథంలు

సహా

కో-క్లస్టరింగ్,

ఏకాభిప్రాయం ఆధారిత క్లస్టరింగ్

సెమీ పర్యవేక్షణ క్లస్టరింగ్

ప్రదర్శించబడుతుంది./సమర్పించబడింది

ప్రయోగాలు ద్వారా

కృత్రిమ డేటాసెట్లను ఉపయోగించి /చడం

వివిధ డేటా సబ్స్ట్రెక్చర్స్/సబ్ స్టక్చర్స్

UCI డేటా సెట్లు,

యొక్క పనితీరు

మూడు రకాల క్లస్టరింగ్ అల్గోరిథంలు

పోల్చబడింది మరియు విశ్లేషించబడింది

. ఒక పద్ధతి ప్రతిపాదించబడింది

ఒక సహ-క్లస్టరింగ్ అల్గోరిథం మరియు ఒక సెమీ/పాక్షిక -పర్యవేక్షించబడిన క్లస్టరింగ్ అల్గోరిథంలను కలపడానికి.

సమగ్ర తులనాత్మక అధ్యయనం

నిర్వహించారు

వర్గీకరణ అల్గోరిథం యొక్క మూడు రకాలు

సహా

లాజిస్టిక్ రిగ్రెషన్ క్లాస్సిఫైయర్, మద్దతు.వెక్టర్ మెషీన్ మరియు డెసిషన్ ట్రీ మద్దతు.

ప్రయోగాలు నిర్వహించబడ్డాయి

వివిధ ఉపయోగించి

కృత్రిమ డేటాసెట్లు మరియు UCI డేటా సెట్లు

విశ్లేషించడానికి మరియు పోల్చడానికి

వారి వర్గీకరణ పనితీరు.

ఉపయోగించి ఒక పద్ధతి

నియంత్రిత ఫాల్స్ డిస్కవరీ రేట్

ప్రతిపాదించబడింది

లాజిస్టిక్ రిగ్రెషన్ క్లాస్సిఫైయర్లో

ముఖ్యమైన లక్షణాలను ఎంచుకోండి.

వివరణాత్మక రుజువు అభివృద్ధి చేయబడింది

అని చూపించడానికి

ఫాల్స్ డిస్కవరీ రేట్ను నియంత్రించడం

ద్వారా సాధించవచ్చు

నియంత్రించడం

సంబంధిత p- విలువ.

ప్రయోగాలు నిర్వహించబడ్డాయి

పోల్చడానికి

వర్గీకరణ పనితీరు

ఉపయోగించి

ప్రతిపాదిత

ఫీచర్ ఎంపిక అల్గోరిథం

Dr.Hariharan Ramamurthy .M.D.

Howard County Community Clinic, Big Spring, TX , USA

IRSI /Quality Healthcare and longevity

Confidentiality Notice

Hari's Random Thoughts by Hariharan Ramamurthy

Tuesday, November 27, 2018

నాట్ బాడ్ గూగుల్ ట్రాన్సలేట్ !Not bad google translate.క్లస్టరింగ్ మరియు వర్గీకరణ/ అత్యంత సాధారణ డేటా మైనింగ్ పనులు

No comments:

Pages

Search This Blog