what is the exact difference between dataset and fileset in
datastage?
Answer Posted / subhash
DataSet:
1. The fundamental concept of the Orchestrate
framework is the Data Set. Data Sets are the inputs and
outputs of Orchestrate operators.
2. As a concept a Data Set is like a database table,
in so far as it is a collection of identically-defined
rows. It is the only structure on which Orchestrate
operators operate. Each operator( i.e., stage) accepts
input from one Data Set and sends its output to another
Data Set.
3. A Data Set exists on all the processing nodes
defined for the job that is currently processing it. That
subset of rows in a Data Set that are located on a single
processing node is referred to as a "partition" of the Data
Set. Technically, a partition is a subset of the rows in a
Data Set (or File Set) earmarked for processing on the same
processing node.
4. A control file is associated with each data set.
The control file contains the record schema that defines
the row structure (effectively its column definitions).
5. Within a Data Set data are stored in internal, or
machine-compatible format.
FileSet:
1. It allows you to read data from or write data to a
file set.
2. The stage can have a single input link, a single
output link and a single reject link.
3. It only executes in parallel mode.
4. The data files and the file that lists them are
called a file set. This capability is useful because some
operating systems impose a 2 GB limit on the size of a file
and you need to distribute files among nodes to prevent
overruns.
5. Only advantage of using fileset over a sequential
file is "it preserves partitioning scheme"
A dataset is a file/stage where the data can be read
directly by the DataStage, whereas a file set needs to be
converted into DataStage readable format (which happens
internally).
In simple words the data from the DataSet can be read
faster than from FileSet.
| Is This Answer Correct ? | 21 Yes | 4 No |
Post New Answer View All Answers
Differentiate between operational datastage (ods) and data warehouse?
What are the steps required to kill the job in Datastage?
how can we create rank using datastage?what is the meaning of rank?
what is 'reconsideration error' and how can i respond to this error and how to debug this
What are the main differences you have observed between 7.x and 8.x version of datastage?
Differentiate between Symmetric Multiprocessing and Massive Parallel Processing?
What is oci?
how can we validate the flat files using the date in the header and number of records in the flat file? Using both conditions at a time.
what is the use of surogate key in datastage
Hi everyone,I have kept a project Sales And Distribution for a pharmaceutical company.can anybody explain one complex business rule that we had in our project and how did you accomplish it using DS?
How one source columns or rows to be loaded in to two different tables?
1.what is repartionoing technique? 2.what deliverables transferred to client using datastage? 3.how to write loop statements using nested loop sequence?
What are the benefits of datastage?
Hi , Can anyone give few examples of scenarios and there corresponding design in datastage..i am new to this tool...confused in design while my manager asking to design the job.. Please post the URL if there..so i can go through it.. Thanks in advance...
What is the different type of jobs in datastage?