what is the exact difference between dataset and fileset in
datastage?
Answers were Sorted based on User's Feedback
DataSet:
1. The fundamental concept of the Orchestrate
framework is the Data Set. Data Sets are the inputs and
outputs of Orchestrate operators.
2. As a concept a Data Set is like a database table,
in so far as it is a collection of identically-defined
rows. It is the only structure on which Orchestrate
operators operate. Each operator( i.e., stage) accepts
input from one Data Set and sends its output to another
Data Set.
3. A Data Set exists on all the processing nodes
defined for the job that is currently processing it. That
subset of rows in a Data Set that are located on a single
processing node is referred to as a "partition" of the Data
Set. Technically, a partition is a subset of the rows in a
Data Set (or File Set) earmarked for processing on the same
processing node.
4. A control file is associated with each data set.
The control file contains the record schema that defines
the row structure (effectively its column definitions).
5. Within a Data Set data are stored in internal, or
machine-compatible format.
FileSet:
1. It allows you to read data from or write data to a
file set.
2. The stage can have a single input link, a single
output link and a single reject link.
3. It only executes in parallel mode.
4. The data files and the file that lists them are
called a file set. This capability is useful because some
operating systems impose a 2 GB limit on the size of a file
and you need to distribute files among nodes to prevent
overruns.
5. Only advantage of using fileset over a sequential
file is "it preserves partitioning scheme"
A dataset is a file/stage where the data can be read
directly by the DataStage, whereas a file set needs to be
converted into DataStage readable format (which happens
internally).
In simple words the data from the DataSet can be read
faster than from FileSet.
| Is This Answer Correct ? | 21 Yes | 4 No |
1) dataset in native format so it can view the data only internally(datastage) where as fileset is in binary format so data can be view in any where which is convert from binary to human understandable language.
2) dataset dont support reject link where as fileset support reject link.
3) dataset is copy operator fileset is import and export operator.
| Is This Answer Correct ? | 13 Yes | 2 No |
Answer / kavi
In DataSet, data is stored in Binary format.
In fileSet, data is stored in the form of text.
That's it...
| Is This Answer Correct ? | 10 Yes | 12 No |
Answer / lokesh butra
Dataset operate the file local server and also its support
upto 2 GB Data
File set operates the file local and remote servers and
also its support unlimited Data
| Is This Answer Correct ? | 2 Yes | 7 No |
Answer / prakash
Dataset is same as that of fileset only difference is reject
link and external use.
| Is This Answer Correct ? | 7 Yes | 13 No |
Question 4) source target c1 c1 c2 c3 c2 c4 c4 c5 c3 c6 c7 c4 c5 c6 c7 Singal Source and Singal Target only subash,
2.how u run your rotines in unix? what is job sequencer? if i have 4 jobs i would like to run 2 jobs in server(using job jobsequencer)and remaining 2 in parallel?how can i run?
Thanks to all people who are posting their comments...
Please tell me What is difference between 8.0 and 8.1
whats difference between ls -ltr and ls -lrt?
how to find out number of records imported into source file?
How to read multiple files using a single datastage job if files have the same metadata?
1)i put Pharma Project in my Resume..whar are the sources used in my project Generally? 2)how many fact and dimensional tables used? 3)Have u used any Datamarts and measues in fact table? ....plz give the answers...
How can we perform 2nd time extraction of client database without accepting the data which is already loaded in first time extraction?
hi All, i have one scenario like if source--->transformer-->2 target sequential files the 1 st target sequential file is loads the data from source and 2nd target sequntial file contain the 1st target total record count,and file name of 1 st target seq file and timestamp seperated by delimeter for example if source have 10 record the 1st target seq file hav 10 records and 2nd target seq file example 10|xyz.txt|20101110 00:00:00 could you please help me out how can i implement in datastage job.
how can we join one oracle & flat files ?
With out using Funnel Stage, how to populate the data from different sources to single target