Golgappa.net | Golgappa.org | BagIndia.net | BodyIndia.Com | CabIndia.net | CarsBikes.net | CarsBikes.org | CashIndia.net | ConsumerIndia.net | CookingIndia.net | DataIndia.net | DealIndia.net | EmailIndia.net | FirstTablet.com | FirstTourist.com | ForsaleIndia.net | IndiaBody.Com | IndiaCab.net | IndiaCash.net | IndiaModel.net | KidForum.net | OfficeIndia.net | PaysIndia.com | RestaurantIndia.net | RestaurantsIndia.net | SaleForum.net | SellForum.net | SoldIndia.com | StarIndia.net | TomatoCab.com | TomatoCabs.com | TownIndia.com
Interested to Buy Any Domain ? << Click Here >> for more details...

what is the exact difference between dataset and fileset in
datastage?

Answer Posted / subhash

DataSet:
1. The fundamental concept of the Orchestrate
framework is the Data Set. Data Sets are the inputs and
outputs of Orchestrate operators.
2. As a concept a Data Set is like a database table,
in so far as it is a collection of identically-defined
rows. It is the only structure on which Orchestrate
operators operate. Each operator( i.e., stage) accepts
input from one Data Set and sends its output to another
Data Set.
3. A Data Set exists on all the processing nodes
defined for the job that is currently processing it. That
subset of rows in a Data Set that are located on a single
processing node is referred to as a "partition" of the Data
Set. Technically, a partition is a subset of the rows in a
Data Set (or File Set) earmarked for processing on the same
processing node.
4. A control file is associated with each data set.
The control file contains the record schema that defines
the row structure (effectively its column definitions).
5. Within a Data Set data are stored in internal, or
machine-compatible format.

FileSet:
1. It allows you to read data from or write data to a
file set.
2. The stage can have a single input link, a single
output link and a single reject link.
3. It only executes in parallel mode.
4. The data files and the file that lists them are
called a file set. This capability is useful because some
operating systems impose a 2 GB limit on the size of a file
and you need to distribute files among nodes to prevent
overruns.
5. Only advantage of using fileset over a sequential
file is "it preserves partitioning scheme"

A dataset is a file/stage where the data can be read
directly by the DataStage, whereas a file set needs to be
converted into DataStage readable format (which happens
internally).

In simple words the data from the DataSet can be read
faster than from FileSet.

Is This Answer Correct ?    21 Yes 4 No



Post New Answer       View All Answers


Please Help Members By Posting Answers For Below Questions

What are the enhancements made in datastage 7.5 compare with 7.0?

1149


How do you remove duplicate values in datastage?

1151


Can anyone tell me a difficult situation who have handled while creating Datastage jobs?

3250


If you want to use the same piece of code in different jobs, how will you achieve it?

1158


What is the difference between Datastage 7.5 and 7.0?

1097


Triggers,VIEW,Procedures

1221


In a batch if a job fails in between and you want to restart the batch from that particular job and not from the scratch then what will you do?

1039


What is the difference between hashfile and sequential file?

1260


What are the features of datastage flow designer?

1040


How will you move hashed file from one location to another location?

2152


Describe the main features of datastage?

1076


root tree will find which is server job and which is parallel job?

1968


How a server job can be converted to a parallel job?

1096


What is the difference between datastage and datastage tx?

1067


Define Routines and their types?

1187