Announcement

Joseph's advice is spot on. I just want to flag that leading and trailing spaces is also common jargon here.

And that stritrim() can be useful. That standardize internal spaces to one only. This example is blatant:

. di stritrim("South Africa") South Africa

but it's quite common in messy text that people mean to type one space, but type two. Or even that data was entered by different people with different habits or conventions.

Comment

Post Cancel Anagaw Derseh 12 Jul 2018, 01:37

Dear Joseph and Nick, thanks indeed for the useful advises.
The problem is solved based on your suggestions

Comment

Post Cancel Morris Muzyamba 10 Oct 2022, 05:19 Hi Nick, What is the full syntax for the di stritrim command please? Is it di stritrim (variable)?

Comment

Post Cancel Rich Goldstein 10 Oct 2022, 05:59
h stritrim()

Comment

Post Cancel Nick Cox 10 Oct 2022, 06:16

display with a variable will
only show results for the first observation — unless you specify otherwise. With a variable use generate or replace and then list or edit .

Comment

Post Cancel Morris Muzyamba 10 Oct 2022, 07:22 Thank you very much for your kind help.

Comment

Post Cancel Sandeep Kandikuppa 28 Apr 2023, 11:53

Hello, I have question that is somewhat similar to that of Anagaw. I am trying to merge two datasets using the household identifier (hhid) as the variable. I was able to merge most of the data. There were some unwanted spaces in the unmerged data. So, i used the excellent advice given by Joseph, above, to deal with that problem. I am still left with about 40 odd observations that don't match. Eyeballing the hhid for these unmatched observations, i cannot find any difference between them. And yet, i cannot seem to get these observations to merge. i am using the following code. Would you be so kind as to help me out here; see if there is anything that i might be overlooking? Thank you in advance.

*to remove unwanted gaps in hhid
replace hhid = subinstr(hhid, " ", "", .)

*collapsed the master data by hhid
collapse (sum) agri_prod land_poss, by(hhid)

*generated hhid_new so that i could compare the unmatched observations post-merging. hhid and hhid_new are the exact same variables.
gen hhid_new = hhid

*merged two datasets using hhid as the key identifier.
merge 1:1 hhid using "C:\Users\kandikus\OneDrive\East West Center\Data\SituationAssessmentofAgriHHs\Climate Data\CRU_APRIL32023\Apr182023\HHID List\NSSO_HHID_2019.dta", generate(_merge_rough)