It also loves to do things like "add all these numbers together" or "add each element of these two lists of numbers to create a third list of numbers." So if you problem looks like those things the NEON processor is going to be huge help. That means that it is very good at performing an instruction (say "multiply by 4") to several pieces of data at the same time. It is an SIMD (Single Instruction, Multiple Data) vector processor. Can GCC optimize these cycles even though they iterate through custom data types?įrom your update, you may misunderstand what the NEON processor does. Would you expect this to improve the performance of the project? Because we experienced no changes at all, which is rather weird considering all the answers I read here.Īnother question: all the for cycles have an apparent number of iterations, but many of them iterate through custom data types (structs or classes). To compile for the ARM board we use a Linaro toolchain cross-compiler, and GCC's version is 4.8.3. Keep in mind that this project includes extensive libraries such as open frameworks, OpenCV, and OpenNI, and everything was compiled with these flags. I compiled my project with the following flags: -O3 -mcpu=cortex-a9 -ftree-vectorize -mfloat-abi=hard -mfpu=neon I use Eclipse IDE in Linux Gentoo to write C++ code.Īfter reading the answers I did some tests with the software. Is there some kind of library or set of functions that can be used in C++ environment? I'm looking to optimize C++ code (mainly some for loops) using the NEON capability of computing 4 or 8 array elements at a time.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |