Interview Questions Answers in Data Stage

subhash

{ City } hyderabad
< Country > india
* Profession * datastage developer
User No # 101246

Total Questions Posted # 15
Total Answers Posted # 100

Total Answers Posted for My Questions # 43
Total Views for My Questions # 151075

Users Marked my Answers as Correct # 555
Users Marked my Answers as Wrong # 33

Questions / { subhash }

Questions Answers Category Views Company eMail

Input Data is: Emp_Id, EmpInd 100, 0 100, 0 100, 0 101, 1 101, 1 102, 0 102, 0 102, 1 103, 1 103, 1 I want Output 100, 0 100, 0 100, 0 101, 1 101, 1 Means Indicator should either all ZEROs or all ONEs per EmpId. Impliment this using SQL and DataStage both.

IBM,

6 Data Stage 10567

What is the Environment Variable need to Set to TRIM in Project Level?(In transfermer, we TRIM function but I need to impliment this project level using Environment variable)

IBM,

Data Stage 2000

How to remove ctrl M or(^M) characters in file using Sequential File stage?

Bank Of America, Satyam,

2 Data Stage 21181

Converting Vertical PIVOTing without using PIVOT stage in DataStage. Ex: DEPT_NO EMPNAME 10 Subhash 10 Suresh 10 sravs Output: DEPT_NO EMP1 EMP2 EMP3 10 subhash suresh sravs 2) How to implement Horizontal PIVOTing without using PIVOT stage.

Cognizant, UHG,

3 Data Stage 15479

How to LOG 'unmatched Master' records and 'Reject Updates' in log files using MERGE stage?

TCS,

1 Data Stage 6264

INPUT file 'A' contains: 1 2 3 4 5 6 7 8 9 10 input file 'B' contains: 6 7 8 9 10 11 12 13 14 15 Output file 'X' contains: 1 2 3 4 5 Output file 'Y' contains: 6 7 8 9 10 Output file 'Z' contains: 11 12 13 14 15 How can we implement this in a single ds job?

CTS,

11 Data Stage 15881

Difference between the Sequence and Sequencer in DS?

TCS,

1 Data Stage 11316

I have 2 files 1st contains duplicate records only, 2nd file contains Unique records.EX: File1: 1 subhash 10000 1 subhash 10000 2 raju 20000 2 raju 20000 3 chandra 30000 3 chandra 30000 File2: 1 subhash 10000 5 pawan 15000 7 reddy 25000 3 chandra 30000 Output file:-- capture all the duplicates in both file with count. 1 subhash 10000 3 1 subhash 10000 3 1 subhash 10000 3 2 raju 20000 2 2 raju 20000 2 3 chandra 30000 3 3 chandra 30000 3 3 chandra 30000 3

TCS,

2 Data Stage 7242

What is confirmed Dimension? what is Factless Fact? give one example? What are Additive, Semi-Additive Facts?

TCS,

1 Data Stage 5667

diff between OLTP and OLAP? what TOP-DOWN and BOTTOM-UP Approach? which is best? what are Star Schema and Snow Flake Schema?

TCS,

2 Data Stage 6199

Display all files which have size more 3KB in given directory/folder.

TCS,

1 Data Stage 4416

I want capture UnMatched records from Primary source and secondary source in JOIN stage?

TCS,

3 Data Stage 13276

Unix Qn asked in datastage interview: I have diff type(.txt, .tmp, .bat etc) of file in 4 diff directories, I want move all '.txt' file from 4 directories to other folder. And need to delete all the files except which are created TODAY?

TCS,

2 Data Stage 8120

I have Seq file, I don't want 10, 11th(or any two records like 20, 30th records ) records in the output

TCS,

6 Data Stage 10689

I have load a Dataset in UAT with 2 Node configuration, imported the job into PROD environment which is 4 node configuration and using this DataSet as SRC to other job. will the job run fine or give any errors? If job runs fine, on how many nodes? 2 nodes or 4 nodes?

TCS,

2 Data Stage 12778

Answers / { subhash }

Question { HP, 33609 }

How do u set a default value to a column if the column value
is NULL?

Answer

we can use
NullToValue(Col_Name, Default_Value)
or
NullToempty(Col_Name)functions in tranformaer stage.

Or
If the source is sequential file then, we have an option in
seq file ==> Formats ===> Filed Defaults ==>"Null Field
Value" --> we can set default value for it.

Is This Answer Correct ?

4 Yes

0 No

Question { TCS, 18795 }

WHAT are unix quentios in datastage

Answer

1. patterns finding and replacing..
SED -n '1,$p' file_name ----> displays 1st & last line
SED -n '1,$d' file_name ----> deletes from O/P 1st & last
line. etc
2. First N records & last N records
HEAD -5
TAIL -5
3. to find word/line count in file
WC -l--- no of line
WC -w--- no of words count
4. AWK, FIND, CUT etc are also useful

Is This Answer Correct ?

6 Yes

0 No

Question { ME, 23756 }

I WANT TO SEND SYSDATE AS PARAMETER,SO WHAT COMMAND I
SHOULD USE TO CALL SYSDATE?

Answer

there many ways to get the SYSDATE:
1) By using the SYSDATE in the select query of your Oracle
Source.
2) By using the DsMacro, the function "DsJobstartDate " in
the transformer.
3) By using Date & Time function - "CurrentDate()" in the
transformer.
4) You would use the Date() function, and then some
variation of the OConv() function to change the date from
internal to external format.

For example, say you want to assign the current date to a
variable called StartDate, and format it in yyyy/mm/dd
format, you would use:

StartDate = Oconv(Date(), "D4/YMD")

Is This Answer Correct ?

6 Yes

0 No

Question { ME, 8223 }

how to call routines in stages?

Answer

double click on the transformer stage, right click on any
one of the mapping field select [dstoutines] option within
edit window give the business logic and select the either
of the options( Before / After Sub Routines.

Is This Answer Correct ?

3 Yes

1 No

Question { Accenture, 22334 }

what is Audit table?Have u use audit table in ur project?

Answer

Audit table can be used to check all the details of the
jobs like :
Job status, Last run date
number of rows processed
number os rows entered / rejected in a job,

some companies use shell script to load logs into audit
table & some companies load logs into audit table using
datastage jobs.

Ex:
JOB_DETAIL table, It may contain columns like:
JOB_ID, JOB_NAME, STATUS, JOB_RUN_DT, REC_INSERTED,
REC_UPDATED etc..

Is This Answer Correct ?

10 Yes

0 No

Question { Wipro, 23075 }

With out using Funnel Stage, how to populate the data from
different sources to single target

Answer

1. Parameterise the File name and pass the diff file names
at run time, in the target we can put write mode 'append'

2. We can add multiple 'File' properties in the seq file
stage and give them diff file names and run it.

3. If all files are in same directory, then
File: DirectoryName/*#Date#.txt
and run the job.

Is This Answer Correct ?

3 Yes

1 No

Question { TCS, 11228 }

How can u execute the sql query through unix?
What is the Primary key for Dimension table?
what is the primary key for Fact table?

Answer

executing the sql query through unix:

sqlplus -s /@ <
testData.txt
set pagesize 0
select sysdate from dual;
EOF

Is This Answer Correct ?

7 Yes

0 No

Question { Accenture, 23046 }

how can i abort the job if i get more than 100 errors in
job log?

Answer

In the designer, After Compiling the job and then click on
run.in the Job run options, u will find 3 TABs :Parameters,
Limits, General. Go to Limits TAB, select the "Abort Job
after" and change it to 100. after 100 warnings(errors) the
job will be aborted.
by default it will set to 50
warnings.

Is This Answer Correct ?

17 Yes

0 No

Question { 9980 }

HOW CAN WE SEE THE DATA IN DATASET?

Answer

1--> Way through DataStage:
through datastage client - designer, for this go to
tools menu->
clik on dataset management ->
select the dataset which one you want to view ->
then click ok ->
click on show datawindow(in cube shape)->
click ok .......

2--> Way through UNIX:
1. Before using orchadmin, you should make sure that either the working directory or the $APT_ORCHHOME/etc contains the file “config.apt” OR The environment variable $APT_CONFIG_FILE should be defined for your session.

Orchadmin commands

1. CHECK: $orchadmin check

Validates the configuration file contents like , accesibility of all nodes defined in the configuration file, scratch disk definitions and accesibility of all the nodes etc. Throws an error when config file is not found or not defined properly

2. COPY : $orchadmin copy

Makes a complete copy of the datasets of source with new destination descriptor file name. Please not that
a. You cannot use UNIX cp command as it justs copies the config file to a new name. The data is not copied.
b. The new datasets will be arranged in the form of the config file that is in use but not according to the old confing file that was in use with the source.

3. DELETE : $orchadmin < delete | del | rm > [-f | -x] descriptorfiles….

The unix rm utility cannot be used to delete the datasets. The orchadmin delete or rm command should be used to delete one or more persistent data sets.
-f options makes a force delete. If some nodes are not accesible then -f forces to delete the dataset partitions from accessible nodes and leave the other partitions in inaccesible nodes as orphans.
-x forces to use the current config file to be used while deleting than the one stored in data set.

4. DESCRIBE: $orchadmin describe [options] descriptorfile.ds

This is the single most important command.
1. Without any option lists the no.of.partitions, no.of.segments, valid segments, and preserve partitioning flag details of the persistent dataset.
-c : Print the configuration file that is written in the dataset if any
-p: Lists down the partition level information.
-f: Lists down the file level information in each partition
-e: List down the segment level information .
-s: List down the meta-data schema of the information.
-v: Lists all segemnts , valid or otherwise
-l : Long listing. Equivalent to -f -p -s -v -e

5. DUMP: $orchadmin dump [options] descriptorfile.ds

The dump command is used to dump(extract) the records from the dataset.
Without any options the dump command lists down all the records starting from first record from first partition till last record in last partition.
-delim ‘’ : Uses the given string as delimtor for fields instead of space.
-field : Lists only the given field instead of all fields.
-name : List all the values preceded by field name and a colon
-n numrecs : List only the given number of records per partition.
-p period(N) : Lists every Nth record from each partition starting from first record.
-skip N: Skip the first N records from each partition.
-x : Use the current system configuration file rather than the one stored in dataset.

6. TRUNCATE: $orchadmin truncate [options] descriptorfile.ds

Without options deletes all the data(ie Segments) from the dataset.
-f: Uses force truncate. Truncate accessible segments and leave the inaccesible ones.
-x: Uses current system config file rather than the default one stored in the dataset.
-n N: Leaves the first N segments in each partition and truncates the remaining.

7. HELP: $orchadmin -help OR $orchadmin -help

Help manual about the usage of orchadmin or orchadmin commands.

Is This Answer Correct ?

2 Yes

0 No

Question { 17194 }

how do u capture duplicates through sort & transformer

Answer

In sort stage, we have 'create key change
column' property.
i.e, The KeyChange column is set to 1 for the first record
in each group where the value of the sort key changes.
Subsequent records in the group have the KeyChange column
set to 0.

In Transformer stage ,we can put constraint as IF KeyChange
= 1 then map to the TGT.

Is This Answer Correct ?

5 Yes

0 No

Question { TCS, 13391 }

1.how to generate even numbers in surrogate or tranformar
stage ?

2. how many ways to remove duplicate values?

Answer

1) By using Stage variables in transformer, we can generate even numbers,
Declare a stage variable - SV1 and initialize to '0'.
In the derivation of that stage variable increment by 2.
i.e. SV1=SV1+2.

By using Surrogate key in transformer, use the below logic in derivation part.
Surrogate_Key -----> If mod(Surrogate_Key,2)= 0 then Surrogate_key Else Surrogate_Key + 10000001

2)Removing Duplicates
a. By using Remove Duplicates Stage.
b. By using SORT Stage,Sort on KEY Column and set 'Allow Duplicates' to 'FALSE'
c. By using Link Partitioning, use 'HASH partition' on KEY Columns and Select Perform Sort and UNIQUE options.
d. By using Transformer Stage Variables,
SV2----->If SV1=KEY_Column then 0 Elso 1
SV1----->KEY_Column
In the Constraint Part give as,
SV2=1

Is This Answer Correct ?

9 Yes

0 No

Question { 3089 }

how to find diff between 2 dates without using Icon...
funtions?

Answer

This function returns number of days from source date to the given date.

DaysSinceFromDate(mylink.sourcedate, mylink.givendate)--->No of days.

Ex:
DaysSinceFromDate(“2014-06-10”,”2014-06-15”)= 5

Is This Answer Correct ?

2 Yes

0 No

Question { 5430 }

i have 4 jobs i want run 1job should run on 1node and 2job
runon 2node
and.... how to make it possible?

Answer

Create 2 Configuration files.. 1 file with 1NODE and 2nd file
with 2NODE configuration.

Open the 1st job,
Go to the 'TOOLS'-->'Configurations' --> Select the 1NODE
configuration file from Drop down list.

Open the 2nd job,
Go to the 'TOOLS'-->'Configurations' --> Select the 2NODE
configuration file from Drop down list.

Is This Answer Correct ?

5 Yes

0 No

Question { 5370 }

how to closeing records after 10,000 records come?

Answer

-->If the Source is File:
"Read First Rows"=10000 --> Only read the first specified number of rows from each file.
"Filter"= sed -n '1,10000 P' or head -10000: we can use UNIX commands to filter the data.

-->If source is DB:
Oracle: where rownum<=10000
DB2: sample 10000 row only
Teradata: fetch first 10000 rows only

-->In Transformer:
We can use @INROWNUM<=10000 system variable
We can use Stage variables and count the values and in the contraints part use Stage_Var<=10000

Is This Answer Correct ?

4 Yes

0 No

Question { Wipro, 6321 }

j1,j2,j3 jobs r runing iwant to run j1,j3 after j2 how to do
this using sequencer?
2)j1,j2,j3 jobs r runing iwant to run j1,j2,j3 how to do this?

Answer

1. Connect J2 to J1 and J3. Connect J2 to J1 and
J3. 'Expression Type' as 'OK -(Conditional) or Custom -
(Conditional)' and for 'Custom - (Conditional)', put
constraint as J2.$JobStatus < 3.

2. Create a sequence without any Links between these jobs
and rund then those jobs will run in parelell.

Is This Answer Correct ?

4 Yes

0 No

[1] 2 3 4 5 6 7 Next