[dpdk-dev,01/19] devtools: add simple script to find duplicate includes

Message ID 20170711185546.26138-2-stephen@networkplumber.org (mailing list archive)
State Superseded, archived
Headers

Checks

Context Check Description
ci/checkpatch warning coding style issues
ci/Intel-compilation fail Compilation issues

Commit Message

Stephen Hemminger July 11, 2017, 6:55 p.m. UTC
  This is just a simple check script to find obvious duplications.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 devtools/dup_include.pl | 64 +++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 64 insertions(+)
 create mode 100755 devtools/dup_include.pl
  

Comments

Thomas Monjalon July 11, 2017, 8:33 p.m. UTC | #1
Hi Stephen,

11/07/2017 20:55, Stephen Hemminger:
> This is just a simple check script to find obvious duplications.
> 
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> ---
>  devtools/dup_include.pl | 64 +++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 64 insertions(+)
>  create mode 100755 devtools/dup_include.pl

Thank you for this script, but... it is written in Perl!
I don't think it is a good idea to add yet another language to DPDK.
We already have shell and python scripts.
And I am not sure a lot of (young) people are able to parse it ;)

I would like to propose this shell script:

dirs='app buildtools drivers examples lib test'
pattern='^[[:space:]]*#include[[:space:]]*[<"](.*)[>"].*'

for file in $(git ls $dirs) ; do
    dups=$(sed -rn "s,$pattern,\1,p" $file | sort | uniq -d)
    [ -n "$dups" ] || continue
    echo "$file"
    echo "$dups" | sed 's,^,\t,'
done
  
Stephen Hemminger July 11, 2017, 11:05 p.m. UTC | #2
On Tue, 11 Jul 2017 22:33:55 +0200
Thomas Monjalon <thomas@monjalon.net> wrote:

> Hi Stephen,
> 
> 11/07/2017 20:55, Stephen Hemminger:
> > This is just a simple check script to find obvious duplications.
> > 
> > Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> > ---
> >  devtools/dup_include.pl | 64 +++++++++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 64 insertions(+)
> >  create mode 100755 devtools/dup_include.pl  
> 
> Thank you for this script, but... it is written in Perl!
> I don't think it is a good idea to add yet another language to DPDK.
> We already have shell and python scripts.
> And I am not sure a lot of (young) people are able to parse it ;)
> 
> I would like to propose this shell script:
> 
> dirs='app buildtools drivers examples lib test'
> pattern='^[[:space:]]*#include[[:space:]]*[<"](.*)[>"].*'
> 
> for file in $(git ls $dirs) ; do
>     dups=$(sed -rn "s,$pattern,\1,p" $file | sort | uniq -d)
>     [ -n "$dups" ] || continue
>     echo "$file"
>     echo "$dups" | sed 's,^,\t,'
> done

Sorry, it is quick and easy. After all it is optional, just like
coccinelle and not part of the build.  Plus checkpatch is in Perl.
  
Thomas Monjalon July 12, 2017, 6:41 a.m. UTC | #3
12/07/2017 01:05, Stephen Hemminger:
> On Tue, 11 Jul 2017 22:33:55 +0200
> Thomas Monjalon <thomas@monjalon.net> wrote:
> 
> > Hi Stephen,
> > 
> > 11/07/2017 20:55, Stephen Hemminger:
> > > This is just a simple check script to find obvious duplications.
> > > 
> > > Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> > > ---
> > >  devtools/dup_include.pl | 64 +++++++++++++++++++++++++++++++++++++++++++++++++
> > >  1 file changed, 64 insertions(+)
> > >  create mode 100755 devtools/dup_include.pl  
> > 
> > Thank you for this script, but... it is written in Perl!
> > I don't think it is a good idea to add yet another language to DPDK.
> > We already have shell and python scripts.
> > And I am not sure a lot of (young) people are able to parse it ;)
> > 
> > I would like to propose this shell script:
> > 
> > dirs='app buildtools drivers examples lib test'
> > pattern='^[[:space:]]*#include[[:space:]]*[<"](.*)[>"].*'
> > 
> > for file in $(git ls $dirs) ; do
> >     dups=$(sed -rn "s,$pattern,\1,p" $file | sort | uniq -d)
> >     [ -n "$dups" ] || continue
> >     echo "$file"
> >     echo "$dups" | sed 's,^,\t,'
> > done
> 
> Sorry, it is quick and easy. After all it is optional, just like
> coccinelle and not part of the build.  Plus checkpatch is in Perl.

checkpatch is not in the repository ;)

I prefer my shell script because
- I am able to maintain it
- the regexp is more tolerant with #include line
- it checks the whole repository
  
Stephen Hemminger July 12, 2017, 9:59 p.m. UTC | #4
On Tue, 11 Jul 2017 22:33:55 +0200
Thomas Monjalon <thomas@monjalon.net> wrote:

> Thank you for this script, but... it is written in Perl!
> I don't think it is a good idea to add yet another language to DPDK.
> We already have shell and python scripts.
> And I am not sure a lot of (young) people are able to parse it ;)
> 
> I would like to propose this shell script:
> 
> dirs='app buildtools drivers examples lib test'
> pattern='^[[:space:]]*#include[[:space:]]*[<"](.*)[>"].*'
> 
> for file in $(git ls $dirs) ; do
>     dups=$(sed -rn "s,$pattern,\1,p" $file | sort | uniq -d)
>     [ -n "$dups" ] || continue
>     echo "$file"
>     echo "$dups" | sed 's,^,\t,'
> done

There is no "git ls" command in current version, 

Using find instead works.

plus shell is 7x slower.

$ time bash -c "find . -name '*.c' | xargs /tmp/dupinc.sh"
real	0m0.765s
user	0m1.220s
sys	0m0.155s
$time bash -c "find . -name '*.c' | xargs ~/bin/dup_inc.pl"
real	0m0.131s
user	0m0.118s
sys	0m0.014s

How about some python code.
  
Thomas Monjalon July 13, 2017, 6:56 a.m. UTC | #5
12/07/2017 23:59, Stephen Hemminger:
> On Tue, 11 Jul 2017 22:33:55 +0200
> Thomas Monjalon <thomas@monjalon.net> wrote:
> 
> > Thank you for this script, but... it is written in Perl!
> > I don't think it is a good idea to add yet another language to DPDK.
> > We already have shell and python scripts.
> > And I am not sure a lot of (young) people are able to parse it ;)
> > 
> > I would like to propose this shell script:
> > 
> > dirs='app buildtools drivers examples lib test'
> > pattern='^[[:space:]]*#include[[:space:]]*[<"](.*)[>"].*'
> > 
> > for file in $(git ls $dirs) ; do
> >     dups=$(sed -rn "s,$pattern,\1,p" $file | sort | uniq -d)
> >     [ -n "$dups" ] || continue
> >     echo "$file"
> >     echo "$dups" | sed 's,^,\t,'
> > done
> 
> There is no "git ls" command in current version, 
> 
> Using find instead works.

Yes, both work if specifying source code directories as above.

> plus shell is 7x slower.
> 
> $ time bash -c "find . -name '*.c' | xargs /tmp/dupinc.sh"
> real	0m0.765s
> user	0m1.220s
> sys	0m0.155s
> $time bash -c "find . -name '*.c' | xargs ~/bin/dup_inc.pl"
> real	0m0.131s
> user	0m0.118s
> sys	0m0.014s

I don't think speed is really relevant here :)

> How about some python code.

I don't really care between shell or python.
I thought a shell script would be really concise
and without Python 2/3 compat issues.
  
Wiles, Keith July 13, 2017, 12:19 p.m. UTC | #6
> On Jul 13, 2017, at 1:56 AM, Thomas Monjalon <thomas@monjalon.net> wrote:

> 

> 12/07/2017 23:59, Stephen Hemminger:

>> On Tue, 11 Jul 2017 22:33:55 +0200

>> Thomas Monjalon <thomas@monjalon.net> wrote:

>> 

>>> Thank you for this script, but... it is written in Perl!

>>> I don't think it is a good idea to add yet another language to DPDK.

>>> We already have shell and python scripts.

>>> And I am not sure a lot of (young) people are able to parse it ;)

>>> 

>>> I would like to propose this shell script:

>>> 

>>> dirs='app buildtools drivers examples lib test'

>>> pattern='^[[:space:]]*#include[[:space:]]*[<"](.*)[>"].*'

>>> 

>>> for file in $(git ls $dirs) ; do

>>>    dups=$(sed -rn "s,$pattern,\1,p" $file | sort | uniq -d)

>>>    [ -n "$dups" ] || continue

>>>    echo "$file"

>>>    echo "$dups" | sed 's,^,\t,'

>>> done

>> 

>> There is no "git ls" command in current version, 

>> 

>> Using find instead works.

> 

> Yes, both work if specifying source code directories as above.


‘git ls’ I had to change it to ‘git ls-files’ to make it work. I think you have a git alias setup for ls-files.

Can this be added to the patch testing?

> 

>> plus shell is 7x slower.

>> 

>> $ time bash -c "find . -name '*.c' | xargs /tmp/dupinc.sh"

>> real	0m0.765s

>> user	0m1.220s

>> sys	0m0.155s

>> $time bash -c "find . -name '*.c' | xargs ~/bin/dup_inc.pl"

>> real	0m0.131s

>> user	0m0.118s

>> sys	0m0.014s

> 

> I don't think speed is really relevant here :)

> 

>> How about some python code.

> 

> I don't really care between shell or python.

> I thought a shell script would be really concise

> and without Python 2/3 compat issues.


Regards,
Keith
  
Thomas Monjalon July 13, 2017, 12:36 p.m. UTC | #7
13/07/2017 14:19, Wiles, Keith:
> 
> > On Jul 13, 2017, at 1:56 AM, Thomas Monjalon <thomas@monjalon.net> wrote:
> > 
> > 12/07/2017 23:59, Stephen Hemminger:
> >> On Tue, 11 Jul 2017 22:33:55 +0200
> >> Thomas Monjalon <thomas@monjalon.net> wrote:
> >> 
> >>> Thank you for this script, but... it is written in Perl!
> >>> I don't think it is a good idea to add yet another language to DPDK.
> >>> We already have shell and python scripts.
> >>> And I am not sure a lot of (young) people are able to parse it ;)
> >>> 
> >>> I would like to propose this shell script:
> >>> 
> >>> dirs='app buildtools drivers examples lib test'
> >>> pattern='^[[:space:]]*#include[[:space:]]*[<"](.*)[>"].*'
> >>> 
> >>> for file in $(git ls $dirs) ; do
> >>>    dups=$(sed -rn "s,$pattern,\1,p" $file | sort | uniq -d)
> >>>    [ -n "$dups" ] || continue
> >>>    echo "$file"
> >>>    echo "$dups" | sed 's,^,\t,'
> >>> done
> >> 
> >> There is no "git ls" command in current version, 
> >> 
> >> Using find instead works.
> > 
> > Yes, both work if specifying source code directories as above.
> 
> ‘git ls’ I had to change it to ‘git ls-files’ to make it work. I think you have a git alias setup for ls-files.

Ah yes, you're right!
Thanks, I has not well understood the original comment :)

> Can this be added to the patch testing?

Yes sure, I will do.
  
Thomas Monjalon July 14, 2017, 3:39 p.m. UTC | #8
13/07/2017 08:56, Thomas Monjalon:
> 12/07/2017 23:59, Stephen Hemminger:
> > On Tue, 11 Jul 2017 22:33:55 +0200
> > Thomas Monjalon <thomas@monjalon.net> wrote:
> > 
> > > Thank you for this script, but... it is written in Perl!
> > > I don't think it is a good idea to add yet another language to DPDK.
> > > We already have shell and python scripts.
> > > And I am not sure a lot of (young) people are able to parse it ;)
> > > 
> > > I would like to propose this shell script:
[...]
> 
> > plus shell is 7x slower.
> > 
> > $ time bash -c "find . -name '*.c' | xargs /tmp/dupinc.sh"
> > real	0m0.765s
> > user	0m1.220s
> > sys	0m0.155s
> > $time bash -c "find . -name '*.c' | xargs ~/bin/dup_inc.pl"
> > real	0m0.131s
> > user	0m0.118s
> > sys	0m0.014s
> 
> I don't think speed is really relevant here :)

I did my own benchmark (recreation time):

# time sh -c 'for file in $(git ls-files app buildtools drivers examples lib test) ; do devtools/dup_include.pl $file ; done'
4,41s user 1,32s system 101% cpu 5,667 total
# time devtools/check-duplicate-includes.sh
5,48s user 1,00s system 153% cpu 4,222 total

The shell version is reported as faster on my computer!

It is faster when filtering only .c and .h files:

for file in $(git ls-files '*.[ch]') ; do
    dups=$(sed -rn "s,$pattern,\1,p" $file | sort | uniq -d)
    [ -z "$dups" ] || echo "$dups" | sed "s,^,$file: duplicated include: ,"
done

# time sh -c 'for file in $(git ls-files "*.[ch]") ; do devtools/dup_include.pl $file ; done'
3,65s user 1,05s system 100% cpu 4,668 total
# time devtools/check-duplicate-includes.sh
4,72s user 0,80s system 153% cpu 3,603 total

I prefer this version using only pipes, which is well parallelized:

for file in $(git ls-files '*.[ch]') ; do
    sed -rn "s,$pattern,\1,p" $file | sort | uniq -d |
    sed "s,^,$file: duplicated include: ,"
done

7,40s user 1,49s system 231% cpu 3,847 total
  
Thomas Monjalon July 14, 2017, 3:54 p.m. UTC | #9
14/07/2017 17:39, Thomas Monjalon:
> 13/07/2017 08:56, Thomas Monjalon:
> > 12/07/2017 23:59, Stephen Hemminger:
> > > On Tue, 11 Jul 2017 22:33:55 +0200
> > > Thomas Monjalon <thomas@monjalon.net> wrote:
> > > 
> > > > Thank you for this script, but... it is written in Perl!
> > > > I don't think it is a good idea to add yet another language to DPDK.
> > > > We already have shell and python scripts.
> > > > And I am not sure a lot of (young) people are able to parse it ;)
> > > > 
> > > > I would like to propose this shell script:
> [...]
> > 
> > > plus shell is 7x slower.
> > > 
> > > $ time bash -c "find . -name '*.c' | xargs /tmp/dupinc.sh"
> > > real	0m0.765s
> > > user	0m1.220s
> > > sys	0m0.155s
> > > $time bash -c "find . -name '*.c' | xargs ~/bin/dup_inc.pl"
> > > real	0m0.131s
> > > user	0m0.118s
> > > sys	0m0.014s
> > 
> > I don't think speed is really relevant here :)
> 
> I did my own benchmark (recreation time):
> 
> # time sh -c 'for file in $(git ls-files app buildtools drivers examples lib test) ; do devtools/dup_include.pl $file ; done'
> 4,41s user 1,32s system 101% cpu 5,667 total
> # time devtools/check-duplicate-includes.sh
> 5,48s user 1,00s system 153% cpu 4,222 total
> 
> The shell version is reported as faster on my computer!
> 
> It is faster when filtering only .c and .h files:
> 
> for file in $(git ls-files '*.[ch]') ; do
>     dups=$(sed -rn "s,$pattern,\1,p" $file | sort | uniq -d)
>     [ -z "$dups" ] || echo "$dups" | sed "s,^,$file: duplicated include: ,"
> done
> 
> # time sh -c 'for file in $(git ls-files "*.[ch]") ; do devtools/dup_include.pl $file ; done'
> 3,65s user 1,05s system 100% cpu 4,668 total
> # time devtools/check-duplicate-includes.sh
> 4,72s user 0,80s system 153% cpu 3,603 total
> 
> I prefer this version using only pipes, which is well parallelized:
> 
> for file in $(git ls-files '*.[ch]') ; do
>     sed -rn "s,$pattern,\1,p" $file | sort | uniq -d |
>     sed "s,^,$file: duplicated include: ,"
> done
> 
> 7,40s user 1,49s system 231% cpu 3,847 total

And now, the big shell optimization:
	export LC_ALL=C
Result is impressive:
	2,99s user 0,72s system 258% cpu 1,436 total

I'm sure you will agree to integrate my version now :)
  
Wiles, Keith July 14, 2017, 4:17 p.m. UTC | #10
> On Jul 14, 2017, at 10:54 AM, Thomas Monjalon <thomas@monjalon.net> wrote:
> 
> 14/07/2017 17:39, Thomas Monjalon:
>> 13/07/2017 08:56, Thomas Monjalon:
>>> 12/07/2017 23:59, Stephen Hemminger:
>>>> On Tue, 11 Jul 2017 22:33:55 +0200
>>>> Thomas Monjalon <thomas@monjalon.net> wrote:
>>>> 
>>>>> Thank you for this script, but... it is written in Perl!
>>>>> I don't think it is a good idea to add yet another language to DPDK.
>>>>> We already have shell and python scripts.
>>>>> And I am not sure a lot of (young) people are able to parse it ;)
>>>>> 
>>>>> I would like to propose this shell script:
>> [...]
>>> 
>>>> plus shell is 7x slower.
>>>> 
>>>> $ time bash -c "find . -name '*.c' | xargs /tmp/dupinc.sh"
>>>> real	0m0.765s
>>>> user	0m1.220s
>>>> sys	0m0.155s
>>>> $time bash -c "find . -name '*.c' | xargs ~/bin/dup_inc.pl"
>>>> real	0m0.131s
>>>> user	0m0.118s
>>>> sys	0m0.014s
>>> 
>>> I don't think speed is really relevant here :)
>> 
>> I did my own benchmark (recreation time):
>> 
>> # time sh -c 'for file in $(git ls-files app buildtools drivers examples lib test) ; do devtools/dup_include.pl $file ; done'
>> 4,41s user 1,32s system 101% cpu 5,667 total
>> # time devtools/check-duplicate-includes.sh
>> 5,48s user 1,00s system 153% cpu 4,222 total
>> 
>> The shell version is reported as faster on my computer!
>> 
>> It is faster when filtering only .c and .h files:
>> 
>> for file in $(git ls-files '*.[ch]') ; do
>>    dups=$(sed -rn "s,$pattern,\1,p" $file | sort | uniq -d)
>>    [ -z "$dups" ] || echo "$dups" | sed "s,^,$file: duplicated include: ,"
>> done
>> 
>> # time sh -c 'for file in $(git ls-files "*.[ch]") ; do devtools/dup_include.pl $file ; done'
>> 3,65s user 1,05s system 100% cpu 4,668 total
>> # time devtools/check-duplicate-includes.sh
>> 4,72s user 0,80s system 153% cpu 3,603 total
>> 
>> I prefer this version using only pipes, which is well parallelized:
>> 
>> for file in $(git ls-files '*.[ch]') ; do
>>    sed -rn "s,$pattern,\1,p" $file | sort | uniq -d |
>>    sed "s,^,$file: duplicated include: ,"
>> done
>> 
>> 7,40s user 1,49s system 231% cpu 3,847 total
> 
> And now, the big shell optimization:
> 	export LC_ALL=C
> Result is impressive:
> 	2,99s user 0,72s system 258% cpu 1,436 total
> 
> I'm sure you will agree to integrate my version now :)
> 

hands down winner, where is the patch? :-)

> 

Regards,
Keith
  

Patch

diff --git a/devtools/dup_include.pl b/devtools/dup_include.pl
new file mode 100755
index 000000000000..2946b6cc2ff3
--- /dev/null
+++ b/devtools/dup_include.pl
@@ -0,0 +1,64 @@ 
+#! /usr/bin/env perl
+
+# BSD LICENSE
+#
+# Copyright 2016 Stephen Hemminger <stephen@networkplumber.org>
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions
+# are met:
+#
+#   * Redistributions of source code must retain the above copyright
+#     notice, this list of conditions and the following disclaimer.
+#   * Redistributions in binary form must reproduce the above copyright
+#     notice, this list of conditions and the following disclaimer in
+#     the documentation and/or other materials provided with the
+#     distribution.
+#   * Neither the name of 6WIND S.A. nor the names of its
+#     contributors may be used to endorse or promote products derived
+#     from this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+# Examine files looking for duplicate references to the
+# same include file.
+#
+# Usage: dup_include.pl foo.c bar.c
+#
+
+use strict;
+use warnings;
+
+die "no input files'n"
+    if ($#ARGV < 0);
+
+foreach my $filename (@ARGV) {
+    open my $f, '<', $filename
+	or die "$filename: $!\n";
+
+    my %included;
+    my $lineno = 0;
+    
+    while (<$f>) {
+	++$lineno;
+	next unless /^#include\s+<(.+)>$/;
+	
+	my $h = $1; 
+	warn "$filename:$lineno duplicate include $h\n"
+	    if $included{$h};
+
+	$included{$h} = 1;
+    }
+    
+    close $f;
+}