BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
jdadams
Fluorite | Level 6

Dear Folks:

 

I'm using Proc Import to read in a semicolon delimited file, but the output file places an apostrophe (and sometimes more) in front of a variable, which happens to be a name.  My Proc Import statement follows below,  and after that I include a snippet of raw data starting at the top of this variable length file. I apologize for raising what may be a basic question, but I honestly do not know the answer.  In my output, claremont h reads as 'claremont h, and later on the same line, claremont reads as 'claremont, and so on.   Perhaps I should have used Infile-Input instead of Proc Import.  Thanks for any help that you can provide.


proc import datafile = 'G:\PubDB_CSV_2015\public.individual.csv'
out = individual
dbms = dlm REPLACE;
delimiter = ';';

guessingrows = 500;
getnames = YES;
datarow = 2;
run;

 

individual_id;name_hash;first_name;first_initial;second_initial;last_name;individual_field

111;claremont h;heather;h;a;claremont;MEDI

113;braley g;"";g; ;braley;MEDI

1 ACCEPTED SOLUTION

Accepted Solutions
jdadams
Fluorite | Level 6

I wanted to update my original message on whether Proc Import placed an unwanted apostrophe before a variable. It turns out that, while this can happen with Excel (not SAS) as a flag indicating text to follow, Excel did not do this with my data.  Instead, the data turned out to be valid.

 

The variables were last names, and the data were sorted so that observations with apostrophes in front appeared first. They say that truth can be stranger than fiction,and that was the case here.  By sorting cases with apostrophes first, Dutch names were displayed.  Since there are almost 600,000 names, there are very many Dutch names.  For example, 't Hart is a Dutch surname,  and there are others, like 't Hooft.  (The characters 't stand for Het, which is Dutch for the article 'The'.)

 

My international data just happened to show these cases.  So the problem in this case turned out to be Dutch Names and my lack of familiarity with them.

 

 

 

 

View solution in original post

5 REPLIES 5
Tom
Super User Tom
Super User

Your example file works fine.

options parmcards=csv;
filename csv temp;

parmcards4;
individual_id;name_hash;first_name;first_initial;second_initial;last_name;individual_field
111;claremont h;heather;h;a;claremont;MEDI
113;braley g;"";g; ;braley;MEDI
;;;;

proc import file=csv dbms=csv out=want replace;
  delimiter=';';
run;

proc print;
run;

Tom_0-1748905434102.png

 

I would say most likely your original file actually has the single quotes.  Excel uses that character to force interpretation of a cell as character instead of numbers.  So you probably cannot SEE the single quote when looking at the CSV file using EXCEL.  Instead look at the CSV file using a TEXT editor.  Or just use SAS to look at the file.

data _null_;
  infile 'G:\PubDB_CSV_2015\public.individual.csv' ;
  input;
  if index(_infile_,"'") ;
  examples+1;
  if examples > 10 then stop;
  list;
run;

 

jdadams
Fluorite | Level 6

Hello Tom:

 

Thanks for the suggestion.  I'll look into this further using Infile-Input which worked in your rendering of the example data.  Thanks for your help.

jdadams
Fluorite | Level 6

I wanted to update my original message on whether Proc Import placed an unwanted apostrophe before a variable. It turns out that, while this can happen with Excel (not SAS) as a flag indicating text to follow, Excel did not do this with my data.  Instead, the data turned out to be valid.

 

The variables were last names, and the data were sorted so that observations with apostrophes in front appeared first. They say that truth can be stranger than fiction,and that was the case here.  By sorting cases with apostrophes first, Dutch names were displayed.  Since there are almost 600,000 names, there are very many Dutch names.  For example, 't Hart is a Dutch surname,  and there are others, like 't Hooft.  (The characters 't stand for Het, which is Dutch for the article 'The'.)

 

My international data just happened to show these cases.  So the problem in this case turned out to be Dutch Names and my lack of familiarity with them.

 

 

 

 

ballardw
Super User

Thank you @jdadams for a clear description of the data issue. Learning such things can be long row to hoe.

 

Had a similar issue with phone numbers once upon a time. We had code to clean up a phone number field that was not stored in a clearly defined form. The same number could appear as (123) 555-1234, 123 555-1234, 1235551234 plus things like extensions, with or without something like EXT or other indicator (as well as other stuff sometimes). So we had code that stripped all the non-digit characters and selected the first 10 to get the 1235551234 version. Turned out that we had some international numbers in the data and the first 2 characters were the nation code. So we had numbers originating in Brazil that we called as if they were in Colorado.

jdadams
Fluorite | Level 6

Dear @ballardw Thanks for your comment on my apostrophe data problem. It means a lot to me given your leadership role in the SAS Community.  I think this episode tells an interesting story, that you should check your data as well as the software that accesses it.

hackathon24-white-horiz.png

The 2025 SAS Hackathon Kicks Off on June 11!

Watch the live Hackathon Kickoff to get all the essential information about the SAS Hackathon—including how to join, how to participate, and expert tips for success.

YouTube LinkedIn

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 769 views
  • 0 likes
  • 3 in conversation