Shared Flashcard Set

Details

LSU ISDS 2000 Test 4
Chapters 4 and 5
38
Other
Undergraduate 2
04/15/2013

Additional Other Flashcards

 


 

Cards

Term

Factors behind the sudden popularity in data mining

Definition
  1. Reduction in cost and increased hardware capacity.
  2. Companies are finding new tools and data as they mine.
  3. Data can be analyzed from a more complete view. 
Term
Examples of applications of data mining
Definition
  1. Discover new drugs and identify successful therapies 
  2. Reduce fraudulent behavior
  3. See customer buying patterns
  4. Reclaim profitable customers
  5. Better target customers/clients
Term
Definition and characteristics of data mining
Definition

Used to describe knowledge discovery in databases.

 

-Uses statistical, mathematical, and other techniques to obtain and identify useful information 

 

Term
How data mining works 
Definition

Data mining finds patterns and defines them in terms

            of mathematical rules that can be used for prediction or association.

Term

The four broad categories for data mining algorithms:

Prediction

Definition
Uses the past to tell what will happen in the future
Term

The four broad categories for data mining algorithms: 

Cluster Analysis

Definition
Identifies natural groupings of things based on their known characteristics
Term

The four broad categories for data mining algorithms:

Association Analysis

Definition
Find commonly co-occuring groupings of things
Term

The four broad categories for data mining algorithms:

Sequential Relationships

Definition
Don't need to know meaning!
Term
Other data mining procedures include:
Definition
 Data visualization and time series forecasting
Term
What are the most common of all data mining approaches?
Definition
Classification procedures
Term
What does Classification involve and name a few examples
Definition

Involves identifying patterns of data and associates them with observations belonging to a certain category. 

Examples can include credit approvalstore location, target marketing, and fraud detection 

- Most common of all data mining approaches

Term
The basic idea of Classification Analysis
Definition

Define the data

use the data to develop a mathematical model

then use that model to predict unknown outcomes for future observations 

Term
Various mathematical techniques are used to develop models for classification. These techniques fall into categories, such as:
Definition

       a.         Decision tree: for classification if the outcome is categorical and the predictors that are either categorical or numeric


b.         Linear discriminant analysis (LDA): if the outcome is categorical and the predictors are all numeric have normal distributions and equal variances


c.         Logistic Regression Analysis (LRA): if the outcome is continuous numeric and the predictors are all numeric have normal distributions and equal variances

Term

Organizations must use a standardized approach for conducting a data mining project and be able to identify some proposed models.

These models include:

Definition

CRISP-DM, DMAIC, SEMMA

Term
The six steps of the CRISP-DM model: 
Definition
  1. Business Understanding- discussing the environment
  2. Data Understanding- determining the variables to be measured
  3. Data Preparation- collection and formation of data
  4. Modeling- detect patterns and relate those to mathematical explanations
  5. Evaluation- determine it's effectiveness and that it's a good representaion of the material
  6. Deployment- use of the model for business decisions 

 

    

Term
DMAIC stands for:
Definition
Define, Measure, Analyze, Improve, Control
Term

SEMMA stands for:

Definition
Sample, Explore, Modify, Model, Assess
Term

Clustering Analysis 

Definition

Places observations (rows, customers, students, etc.) into groups so that the members share similar characteristics but the groups themselves are highly different

Ex: Sorting hat in harry potter

Term
How is cluster analysis different from classification analyses?
Definition

Cluster Analysis- groups are unknown and created

Classification Analysis- groups are distinct and known

Term
Common application of Market Segmentation
Definition

An analysis that aids in dividing customers into groups based upon data descriptions so that you can individually target those groups

- used to understand the buyer behavior of customers

- used to help retailers in targeting similar groups of customers to determine the appropriate advertising campaign

 

Term
Examples of Market Segmentation
Definition

Gender, age, income, education level.

(Brands towards men/women, music dowloads towards young, hearing aids towards old, etc)

Term
Association Analysis
Definition

Aimed at associations that establish relationships among items within a given record. 

(Variables or Columns)

- The goal is to create groups of variables that are similar

Term
Market Basket Analysis
Definition

In the retail business it refers to research that provides the retailer with information to help understand the purchase behavior of a buyer.

 

Ex: People who buy medicine, buy tissue. 

People go to the store just for milk, so it's in the back of the store.

Term
Text Mining
Definition

Most data is stored in text documents that lack structure. Text Mining is the semiautomatic process of extracting patterns from large amounts of unstructured data. 

Aka: text data mining or knowledge discovery in text databases.

Different from search engines because they use known relationships and text mining discovers new patterns.

Term

Most popular text mining analyses (4)

Definition
  1. Summarization
  2. Categorizing/Classification
  3. Clustering
  4. Concept linking
Term
Common Applications of Text Mining
Definition

Information can be gained by sifting through court orders, medical discharge summaries, quarterly reports, customer comments, etc. 

Also emails.

Term
Extraction
Definition

Most basic form of text mining (used for summarization).

  • The simplest data structure is the feature vector which is a weighted list of words
  • The most important words in the text are listed along with their reletive importance
  • As a result, the doc. is reduced to a list of terms and weights. The details of the document may not exist, but the key concepts are identified
Term
Term-Document Matrix (TDM)
Definition

Used for the Categorization/Classification, Clustering, and Concept Linking analysis.

Created where the rows represent the documents and the columns represent the terms, and the frequencies represent the number of times a term appears in a document. 

Term
Text Mining maps what?
Definition
 Maps unstructured information (in the form of a document of words) into a structured format (in the form of a feature/term vector) or a concept
Term
Vector
Definition

A weighted list of words which defines a concept that describes unstructured information (document of words).


Created by 1: eliminating articles (the, and, etc.) 2: replace words with their roots (phones, phoning = phone) 3: make synonyms uniform (pupil = student) 4: Calculate the weights of remaining terms

Term
Common weighting factor
Definition

Term frequency or "tf"

-measures the number of times a word appears in a document. 

ex: a large tf factor increases the weight

(graph in notes)

Term


Term-Document Matrix (TDM)

Definition

Created where the rows and represent the documents and the columns represent the terms, and the frequencies represent the number of times a term appears in a particular document 

used for conducting analyses such as classification analysis/categorization, cluster analysis, and association analysis/concept linking 3 of the 4 popular types of text mining analyses 

Term

Text Mining Process 

(3 tasks)

Definition

Task 1: Establish the corpus- the purpose is to collect all documents related to a domain of interest for analysis. Then converted to a similar format. 

Task 2: Create the TDM

Task 3: Extract the knowledge- done 4 ways (Classification analysis, clustering, association analysis, and trend analysis*)

*Trend analysis: Analyze text in various periods of time to see trends or see how concepts evolve over time

 

Term
Text Mining Applications
Definition

Marketing and Customer Relationship Management 

- Group customers with similar complaints, group with purchasing patterns

Security Application

ECHELON surveillance is most prominent tm application


Term
Web Mining
Definition

The Web is the biggest data/text repository and is growing every day. 

 

WM is the discovery of relationships from web data 


ex: hyperlinks to websites from other websites

Term
3 different areas of web mining
Definition
  1. Web Content Mining- extracts and uses the content found within the web pages (key concepts). 
  2. Web Structure Mining- Extracting useful information from the analysis of links found in the web documents. More links, more deep coverage of info.
  3. Web Usage Mining- Extracts and uses information that is generated through web page visits, traffic, transactions, etc. (user history)
Supporting users have an ad free experience!