Joseph's advice is spot on. I just want to flag that leading and trailing spaces is also common jargon here.
And that stritrim() can be useful. That standardize internal spaces to one only. This example is blatant:
. di stritrim("South Africa") South Africa
but it's quite common in messy text that people mean to type one space, but type two. Or even that data was entered by different people with different habits or conventions.
Dear Joseph and Nick, thanks indeed for the useful advises.
The problem is solved based on your suggestions
h stritrim()
display with a variable will
only show results for the first observation — unless you specify otherwise. With a variable use generate or replace and then list or edit .
Hello, I have question that is somewhat similar to that of Anagaw. I am trying to merge two datasets using the household identifier (hhid) as the variable. I was able to merge most of the data. There were some unwanted spaces in the unmerged data. So, i used the excellent advice given by Joseph, above, to deal with that problem. I am still left with about 40 odd observations that don't match. Eyeballing the hhid for these unmatched observations, i cannot find any difference between them. And yet, i cannot seem to get these observations to merge. i am using the following code. Would you be so kind as to help me out here; see if there is anything that i might be overlooking? Thank you in advance.
*to remove unwanted gaps in hhid
replace hhid = subinstr(hhid, " ", "", .)
*collapsed the master data by hhid
collapse (sum) agri_prod land_poss, by(hhid)
*generated hhid_new so that i could compare the unmatched observations post-merging. hhid and hhid_new are the exact same variables.
gen hhid_new = hhid
*merged two datasets using hhid as the key identifier.
merge 1:1 hhid using "C:\Users\kandikus\OneDrive\East West Center\Data\SituationAssessmentofAgriHHs\Climate Data\CRU_APRIL32023\Apr182023\HHID List\NSSO_HHID_2019.dta", generate(_merge_rough)