Parsing text files

Helper

THE POST BELOW IS MORE THAN 5 YEARS OLD. RELATED SUPPORT INFORMATION MIGHT BE OUTDATED OR DEPRECATED

On 18/10/2007 at 19:24, xxxxxxxx wrote:

Okay, I found a solution for my problem.

char test[2] = " ";
bfile->ReadBytes(test,1);
singleCharacter = test;

And in this singleCharacter I can find a "\n", perhaps because of conversion from char to String?!

Another time sorry for using your thread.

Helper

THE POST BELOW IS MORE THAN 5 YEARS OLD. RELATED SUPPORT INFORMATION MIGHT BE OUTDATED OR DEPRECATED

On 18/10/2007 at 19:33, xxxxxxxx wrote:

Isn't there another possibility to read a file with a huge content without reading every char and testing if this is a newline or not?

Nope. Yep. You can read a file char by char or you can use C or STL to read lines (fgets() or fstream). The problem with reading lines is that you need a buffer big enough for any situation. For instance, people sometimes strip line endings from source code text files to reduce size (though this is not required these days). You'll end up reading the entire file into a single line buffer.

I do it char by char using BaseFile::ReadChar() because I want to avoid any implementation differences between VisualStudio, CodeWarrior, and Xcode in interpreting bytes with fgets() for instance - i.e.: endian, end-of-line (different on MacOS, Windows, and Linux), etc.

Whatever you are more comfortable with and works for you.

Reading Strings with ReadString() forces also a crash of Cinema 4d.

That's because ReadString() is for reading C4D Strings written with WriteString(). These are not char* or the STL String class. These are C4D String classes which save in a particular format (4 bytes number-of-characters followed by the string and a null-terminator - see BaseFile::ReadString() in Resource:_api:c4d_file.cpp).

Helper

THE POST BELOW IS MORE THAN 5 YEARS OLD. RELATED SUPPORT INFORMATION MIGHT BE OUTDATED OR DEPRECATED

On 18/10/2007 at 19:41, xxxxxxxx wrote:

Quote: Originally posted by Shaden1 on 18 October 2007
>
> * * *
>
> Okay, I found a solution for my problem.
>
> char test[2] = " ";
> bfile- >ReadBytes(test,1);
> singleCharacter = test;
>
> And in this singleCharacter I can find a "\n", perhaps because of conversion from char to String?!
>
> Another time sorry for using your thread.
>
>
> * * *

I'd be careful there. It is getting "\n" only because you are on MacOS. Ask how I know this. Windows uses 0x0D+0x0A for end-of-lines (\n). MacOS on the other hand uses 0x0D only. For Windows, you'll need to read two bytes to get past the newline. It is possible just to look for 0x0D but then you have to be prepared for a possible 0x0A. This can happen on either OS.

That aside, the other problem I see you have just corrected - a char array forming a char* (string) needs to be null-terminated (end with a byte = 0). If you were just getting one character in a one element array and then trying to convert that into a string, you should see why the problems. All strings need to be ABCDEFG\0 and this counts for char arrays being treated as strings as well. Always make the char array one bigger than the expected maximum number of characters and set the last unfilled element to 0 after setting the 'string':

char test[8];
test[0] = 'H';
test[1] = 'i';
test[2] = '!';
test[3] = 0;

Helper

THE POST BELOW IS MORE THAN 5 YEARS OLD. RELATED SUPPORT INFORMATION MIGHT BE OUTDATED OR DEPRECATED

On 19/10/2007 at 02:52, xxxxxxxx wrote:

I am sorry, but I am not working with MacOS. It seems to work with windows too, but I have to think about your additional comments.

Now I understand why ReadString crashes...should perhaps be documented in the SDK-Documentation, because "Read a string from the file." can be everything for me.

And because I didn't add a 0 at the end, there was always trash in my converted string, right? I will test this later, thank you.

With ReadChar, I am not able to find "\n", but I don't know what I am doing wrong. ReadBytes causes problems with MacOS then, I know. But the first think is, that my Plugin is running. All these things will be corrected at the end.

Thank you very much for your replies. You are really great!

Nice greetings,
Manuel

Helper

THE POST BELOW IS MORE THAN 5 YEARS OLD. RELATED SUPPORT INFORMATION MIGHT BE OUTDATED OR DEPRECATED

On 19/10/2007 at 11:39, xxxxxxxx wrote:

Hi Manuel,

Don't worry. You're not hijacking the thread at all. As long as we're both trying to solve the same problem, it really should be contained in one thread.

About ReadChar, I believe it won't detect "\n" because that is a string value, not an ASCII character value.

Here's a character table for ASCII values.

http://www.asciitable.com/

I have only a basic understanding of this stuff, so please, someone correct me if I'm wrong.

Helper

THE POST BELOW IS MORE THAN 5 YEARS OLD. RELATED SUPPORT INFORMATION MIGHT BE OUTDATED OR DEPRECATED

On 19/10/2007 at 12:44, xxxxxxxx wrote:

You can always check the character representation: '\n'. But as noted, I wouldn't completely trust that. A Windows build will expect the two character end-of-line and if the file was created on a Mac, it will be interpreted incorrectly (as a Carriage Return CR). Simply load any Mac text file into NotePad and note the result. This is because Windows isn't finding CR+LF, only CR.

As I explained, the best way around this is to ReadChar() and check for 0x0D (CR). That will definitively denote an end-of-line (except for Unix where LF is used - but that should be rare if ever). To be certain though, you have to go one more character to see if there is a 0x0A (LF) for potential Windows files - otherwise it will be added to the next line/token read and wreak havoc.

Here's my FileReader::ReadLine() method. Note that I check for either CR or LF (this supports Windows, MacOS, Unix). If there are two characters representing the end-of-line, note that leading 'whitespace' is always skipped. That includes CR and LF as their numerical values are less than 32 (Space). So the next call to ReadLine() skips any spurious end-of-line characters remaining - including empty lines.

Note that I use a file buffer. This is a large buffer that holds as much of the file as possible - FillBuffer() uses ReadBytes() to bring the file in clumps. It is done this way since I'm dealing with text files of many megabytes (hundreds even). Note that it also simplifies the line reading process to avoid ReadChar() continuously.

// Read a line from file (up to EOL or EOF),
// skipping leading whitespace, even blank lines
//*---------------------------------------------------------------------------*
CHAR* FileReader::ReadLine()
//*---------------------------------------------------------------------------*
{
     // Check for ESC key (abort load)
     if (GetInputEvent(BFM_INPUT_KEYBOARD, keyinput) && (keyinput.GetLong(BFM_INPUT_CHANNEL) == KEY_ESC))
     {
          abort = TRUE;
          return NULL;
     }

// Step 1: Skip leading whitespace
     fbuf =          fbufptr;
     do {
          // Reached end of file buffer, read more
          if ((fbufptr == fbufend) && !FillBuffer())     return NULL;

c = *fbufptr;
          ++fbufptr;
     } while (c <= UNICODE_SPACE);
     bytesRead +=     (fbufptr-fbuf);

// Step 2: Read line into lbuffer until EOL (or EOF)
     fbuf =          fbufptr;
     lbufptr =     lbuffer;
     do {
          // Buffer overflow - line equal to or longer than BUFFER_SIZE
          if (lbufptr == lbufend)     return (CHAR* )ErrorException::NullThrow(EE_DIALOG, GeLoadString(IPPERR_LINETOOLONG_TEXT), filename.GetString(), GetLineString());

// Store character into line buffer
*lbufptr = c;
++lbufptr;

// Reached end of file buffer, read more
if ((fbufptr == fbufend) && !FillBuffer()) return NULL;

c = *fbufptr;
          ++fbufptr;
     // - PC uses CR+LF (0x000D+0x000A), Mac uses CR (0x000D)
     } while ((c != UNICODE_CR) && (c != UNICODE_LF));

bytesRead +=     (fbufptr-fbuf);
     // Set Status Bar Progression
     StatusSetBar(bytesRead / statusConstant);
     *lbufptr = 0;
     return lbuffer;
}

Helper

THE POST BELOW IS MORE THAN 5 YEARS OLD. RELATED SUPPORT INFORMATION MIGHT BE OUTDATED OR DEPRECATED

On 13/02/2008 at 11:25, xxxxxxxx wrote:

Hi Guys,
I am currently writing a plug that needs to read a simple text file. This thread has been a ton of help, but I still have a problem when converting a CHAR to a String. Whenever I assign a char to a string, Cinema seems to append about 8 characters that look like a one '1' to the end of each character. Here's the code:
<CODE>
Bool HybridDopey::Command(LONG id, const BaseContainer &msg;)
{
String *dopeStr = NULL;
String singleChar = "", line = "", token;

CHAR c;
CHAR *pc = &c;

LONG pos;
     switch (id)
     {
          case IDC_HD_SETUP_BUTTON:
               if(!dopeSheetFileGUI) return TRUE;

//get the file name from the filename GUI
               dopeSheetFile = dopeSheetFileGUI->GetData().GetValue().GetFilename();

               if(!file) return TRUE;

//open the file for reading
if(!file->Open(dopeSheetFile)) return TRUE;

//get the length of the file
VLONG fileLen = file->GetLength();

GePrint("Entering For Loop");
               for(VLONG i = 0L; i != fileLen; i++)
               {
                    if(file->ReadChar(pc))
                         //GePrint("Read Char");
                    singleChar = String(pc, St8bit);

GePrint(singleChar);

//line read from the file
                    if(*pc == 'CR' || *pc == ',')
                    {
                         GePrint("Found End of Line");
                         //get the first token
                         if(line.FindFirst("", &pos;))
                         {
                             token = line.SubStr(0L, pos);
                             GePrint("Found Token");
                         }
                    }else{
                    line += singleChar;

                    }
               }

               GePrint("The Final Line is: " + line);
               GePrint(token);

file->Close();

break;

}

return TRUE;
}[/CODE]

Also, the text file that I am reading only has the word "test," in it.

Thanks in advance for any help.
Josh

Helper

THE POST BELOW IS MORE THAN 5 YEARS OLD. RELATED SUPPORT INFORMATION MIGHT BE OUTDATED OR DEPRECATED

On 13/02/2008 at 14:40, xxxxxxxx wrote:

Just as some extra info: This simple code gives me a total of 40 characters when I assign the CHAR array to a string (5 characters * 8);

> \> CHAR w[5] = { 'w', 'h', 'a', 't', '!' }; \> String wStr = w; \> LONG ogStringLen = wStr.GetLength(); \> \> GePrint(LongToString(ogStringLen)); \>

This will output 40 to the console, which makes ABSOLUTELY no sense to me! The only way to get the string to behave as I would expect it to is to add the following code:

> \> \> LONG stringLen = wStr.GetCString(w, wStr.GetCStringLen(StXbit)+1); \> wStr.Delete(5, stringLen - 5); \>

But performing this while parsing a text file is going to make the plugin very slow. Does any one else have this problem when assigning a char to a string?

Josh

Helper

THE POST BELOW IS MORE THAN 5 YEARS OLD. RELATED SUPPORT INFORMATION MIGHT BE OUTDATED OR DEPRECATED

On 13/02/2008 at 15:23, xxxxxxxx wrote:

The char array must be Null-terminated, i.e.:

CHAR w[6] = { 'w', 'h', 'a', 't', '!', 0 };

The string length doesn't include the null-terminator. You are probably seeing 40 as that is the first byte in memory past the array it encounters as 0 value.

Helper

THE POST BELOW IS MORE THAN 5 YEARS OLD. RELATED SUPPORT INFORMATION MIGHT BE OUTDATED OR DEPRECATED

On 24/04/2008 at 05:48, xxxxxxxx wrote:

The problem Josh has, is because of the pointers/references.

At this point: singleChar = String(pc, St8bit); he converts data to a string, without knowing that it is just a CHAR!

So using 'c', and not its pointer 'pc': singleChar=String(c, St8bit); will do.

Here is a little example:

> <code>
>      CHAR test=100;
>      String str, strRef;
>      str.Insert(0, test);
>      strRef.Insert(0, &test;);
>      GePrint(str);
>      GePrint(strRef);
> </code>

You will see strRef has the right charakter in the first place, but it is followed by converted random memory.

The same thing should happen in Roberts code, wonder why nobody noticed, but it is a good example of how dangerous pointers are ;-).

Helper

THE POST BELOW IS MORE THAN 5 YEARS OLD. RELATED SUPPORT INFORMATION MIGHT BE OUTDATED OR DEPRECATED

On 24/04/2008 at 06:32, xxxxxxxx wrote:

Just noticed, using singleChar=String(c, St8bit) obviously won't work, but I think you get the idea (yes I know this thread is old ...future reference etc)