Splunk Dedup command removes all the events that presumes an identical combination of values for all the fields the user specifies. The Dedup command in Splunk removes duplicate values from the result and displays only the most recent log for a particular incident. The Splunk Dedup command will return the first key value found for that particular search keyword/field.
By using Splunk Dedup command, the user can specify the counts of duplication with respect to events to keep either for every value of single filed or for combinations of each value among various fields. The events reverted by Splunk Dedup are based on search order, In the case of historical searches, the recent happenings are searched primarily. At the same time for real-time searches, the primary events that are received are the searched events which might not necessarily be the most recent events which took place. With the help of Splunk Dedup, the user can exclusively specify the count of events with duplicate values, or value combinations, to retain. One can as well sort the fields in order to have a clarity on which events are being retained. Alternative options in Splunk Dedup, allow the users to retain events with the removal of duplicate fields or retain the events where the specified fields do not exist in the events.
The main functionality of uniq commands is to remove duplicated data if the entire row or the event is similar. Whereas Dedup commands focus only at the specifically mentioned fields. For instance: If the user says, "| dedup host", the Dedup command focus at the host filed and keeps the first from each host. In dedup commands, one can specify numerous fields and also has an option like consecutive, where the Dedup command removes the events with duplicate combinations of values that are consecutive in nature or keep empty that retains events which do not have the specific required field. The Uniq command removes any search result which is an exact duplication, so the events have to be restored in order to use it. On the other hand, the dedup command is highly flexible unlike uniq command, dedup command can be map-reduced and can be trimmed to a particular size defaulting to 1 and can be applied to n number of fields at the same point of time.
Related Page: Splunk Dashboard
One can avoid using Splunk Dedup command on the _raw field when searching over a large volume of data, If this function is performed the data of every event in the memory will be retained which in end effects the searchability. In Splunk Dedup this is an expected behavior and is applied to any field with high cardinality and large size. For instance, if the user searched for all the logs or values and applied the Splunk dedup command for user id field i.e. the dedup field then eventually displays only one log or value for each uid.No log repetition takes place in the entire process.
Lexicographical order functions by sorting the items based upon their values used to encode the items in the device memory. In Splunk software, this is items based on the values used to encode the items in computer memory. In Splunk software, this is almost always encoding of UTF-8, a superset of ASCII. In Lexicographical order, the numbers are sorted prior to the letters, and the former are stored based on the first digit. For instance: the numbers 10, 9, 70, 100 are sorted lexicographically as 10, 100, 70, 9.
Related Page: Splunk Logging
When coming to the alphabetical assortment, the uppercases are sorted before the lower cases. Symbols do not follow any standardized of a process in assortment in Lexicographical order. They can either be sorted before numerical values or before or after alphabetical values.
Dedup acts as filtering command, by taking search results from previously executed command and reduce them to a smaller set of output. Removal of redundant data is the core function of dedup filtering command. Splunk Dedup removes output which matches to specific set criteria, which is the command retains only the primary count results for each combination of values of specific fields and if the count isn't specified, it defaults to 1 and returns the result priory found.
Related Page: Splunk Careers
There are separate commands with respect to Splunk Dedup filtering command for a specific situation. In the case of retaining all the results and removing only duplicate data, the user can execute keep events command. In case the results reverted are the primary results found with the combination of specific field values which are generally the most recent ones, then the user can use the sort by clause to change the assortment of order if needed and finally if the fields where the specified filed do not ec=xsit at all are retained by default, user can use keepnull= option to countermand the default behaviour if he desires to.
There are numerous solt_filed options with respect to Dedup. Through the user can witness the specific options in order to sort the events. is defined as the name of the field to sort. Auto feature determines automatically the process to sort the field values. Ip interprets the field value as IP addresses and Num at the same time interprets the field values as numerical. At last ordering of field values by utilizing lexicographic order is performed through str.
This example showcases how dedup command is executed. For instance, A user wants to group all events with repetitive occurrences of a value to remove these repetitions from reports and alerts.
Solution, supposedly the events are as follows :
Related Page: Splunk Alerts
Suppose you have events as follows:
2012-07-22 11:45:23 code=239
2012-07-22 11:45:25 code=773
2012-07-22 11:45:26 code=-1
2012-07-22 11:45:27 code=-1
2012-07-22 11:45:28 code=-1
2012-07-22 11:45:29 code=292
2012-07-22 11:45:30 code=292
2012-07-22 11:45:32 code=-1
2012-07-22 11:45:33 code=444
2012-07-22 11:45:35 code=-1
2012-07-22 11:45:36 code=-1
Here the ultimate aim is to attain seven events, one for each of the specific code values in a row: 239, 773, -1, 292, -1, 444, -1. Normally, in this case, the users always take the wrong step and execute transaction command ( … | transaction code ) respectively, but executing dedup command which is much more straightforward approach. This command will remove all the duplicate events but in this case , a command which can remove duplicates that appear in a cluster is required and hence, … | dedup code consecutive=true is executed to obtain the correct result.
Madhuri is a Senior Content Creator at MindMajix. She has written about a range of different topics on various technologies, which include, Splunk, Tensorflow, Selenium, and CEH. She spends most of her time researching on technology, and startups. Connect with her via LinkedIn and Twitter .