What is the difference between nodup and nodupkey options?
Answers were Sorted based on User's Feedback
Answer / asha
The NODUP option checks for and eliminates duplicate
observations.
The NODUPKEY option checks for and eliminates duplicate
observations by variable values.
Is This Answer Correct ? | 100 Yes | 13 No |
Answer / stopby
The Nodup: can only remove the duplicate next to each other.
The by variables are very important for remove
the duplicates which all the variables have the same value.
The nodupkey: will remove the duplicated when they have the
same values for the by variables
Is This Answer Correct ? | 43 Yes | 3 No |
Answer / tariq sharjil
Nodup:
It deletes those observations if every variable in the
dataset has the same value
Nodupkey:
It deletes all the observation on sorting variable. It
retains the first variable and deletes all other coming
after that
Is This Answer Correct ? | 23 Yes | 5 No |
Answer / majid
data test1;
input id1 $ id2 $ extra ;
cards;
aa ab 3
aa ab 1
aa ab 2
aa ab 3
;
proc sort nodup data=test1;
by id1 ;
run;
proc print data=test1;
run;
output will be like this:
Obs id1 id2 extra
1 aa ab 3
2 aa ab 1
3 aa ab 2
4 aa ab 3
*nodup" is an alias for "noduprecs" which appears to
mean "no duplicate records" but there is no way sas can
know about these duplicate records unless they, by chance,
land next to each other in sequence It is a big mistake
to think sorting "nodup" will remove duplicate records.
Sometime it will, sometime it won't. The only way you can
be sure of removing duplicate records is to "proc sort
nodupkey" and include enough key variables to be sure you
will lose the duplicates you want to lose. In the case
shown above, then if we knew of the same "extra" values
being duplicates we wanted to remove then this variable
should be included in the list of sort variables and
then "nodupkey" will remove the duplicates as shown below.
;
proc sort nodup data=test1;
by id1 id2 extra;
run;
proc print data=test1;
run;
output will be like this:
Obs id1 id2 extra
1 aa ab 1
2 aa ab 2
3 aa ab 3
so as u can see nodup eliminated all duplicate observations
if you sort them by all variables but nodupkey will show
only the duplicate observation.
proc sort nodupkey data=test1;
by id1 ;
run;
options nocenter;
proc print data=test1;
run;
output will be like this:
Obs id1 id2 extra
1 aa ab 3
Is This Answer Correct ? | 22 Yes | 6 No |
Answer / pavan
Nodup : it delete the observartions based on each and every
variable value is same irespective of sorting varibale.
Nodupkey: It delete the observarions based on sorting
variable.
Is This Answer Correct ? | 13 Yes | 4 No |
Answer / sas d
NODUP - removes the duplicates. Here the key to remove the
duplicates is the entire record.
NODUPKEY - removes the duplicates. Here the key is the
variable(s) specified by the BY statement.
Is This Answer Correct ? | 8 Yes | 3 No |
Answer / chiranjeevi
nodup:
By using the proc sort procedure along with nodup
option ,it checks for and eliminates the duplicate records.
nodupkey:
NODUPKEY eliminates the duplicate observation keys
in the data set.
Is This Answer Correct ? | 19 Yes | 17 No |
Answer / susheel
The nodup option in the sort procedure eliminates observations that are exactly the same across all variables.
The nodupkey option in the sort procedure eliminates observations that are exactly the same across BY variable.
Is This Answer Correct ? | 5 Yes | 3 No |
Answer / chandu
NODUPKEY :
It checks similar BY variable values and deletes duplicate
observations in the data set based on BY variable values...
NODUP :
It will be available with latest version, It checks and
deletes duplicate observations in the dataset.....
Any comments plzz...
Is This Answer Correct ? | 14 Yes | 13 No |
Answer / shalabh tyagi
Nodup: Checks for duplicacy among the variables in a row and
keeps the 1st row of that observaion in the final output and
deletes the rest
Nodupkey: Checks for the duplicacy among the variables
specified in "by" statement and keeps the 1st row of the
observaion and deletes the rest
Is This Answer Correct ? | 3 Yes | 2 No |
Tell e how how dealt with..
If i doest required Cumilative frequency in my table, generated by using PROC FREQ what i had to do?
For what purpose(s) would use the RETURN statement?
How to create list output for cross-tabulations in proc freq?
what versions of sas have you used (on which platforms)? : Sas programming
Explain the message ‘MERGE HAS ONE OR MORE DATASETS WITH REPEATS OF BY VARIABLE’.
what is the primary variable in your study?
What would you change about your job?
how to debug and test the sas program? : Sas-administrator
which date function advances a date, time or datetime value by a given interval? : Sas programming
What are the difficulties u faced while doing vital signs table or dataset?
Give an example where SAS fails to convert character value to numeric value automatically?