Thursday, 8 December 2016

Missing odd indexes from txt file - parsing whitespace-separated fields

Im trying to write a script which works as follows:
My input is a text file with 8 rows and 8 columns, filled with values 0 or 1, with a single space character each separating the columns.
I need to check the 4th number in each row, and output false, if it is 0, and true, if it is 1.
My code at the moment looks like this:
param($fname)
$rows = (Get-Content $fname)
for ($i=0;$i -lt $rows.Length;$i++)
{ 
 if ($rows[$i][6] -eq 1)
  {
   Write-Host "true"
  }
 if ($rows[$i][6] -ne 1)
  {
    Write-Host "false"
  }
}
So I use [$i][6], because I get that that's the 4th number, accounting for the number of spaces acting as separators.
I checked and thought it was perfect, but somehow it says false for every line, but when I Write-Host $rows[0][6] it is 1.
-------------------------------------------------------------------------------------------------------------------------

Best Answer;



tl;dr
# Create sample input file, with values of interest in 4th field
# (0-based field index 3).
@'
0 0 0 0 0 0 0 0 
0 0 0 1 0 0 0 0 
'@ > file

foreach ($line in (Get-Content file)) {
    $fields = -split $line
    if ($fields[3] -eq '1') { 
        "true"
    } else {
        "false"
    }
}
yields:
false
true

There are many subtleties to consider in your original code, but the above code:
  • offers a more awk-like approach by splitting each input line into whitespace-separated fields, whatever the length of the fields, courtesy of the unary -split operator.
  • subscripts (indices) can then be based on field indices rather than character positions.
  • All fields returned by -split ... are strings, hence the comparison with string literal '1', but, generally, PowerShell performs a lot of behind-the-scenes conversion magic for you: with the code above - unlike with your own code - using 1 would have worked too.

As for why your approach failed:
  • Indexing into (using a subscript with) a string value in PowerShell is a special case: it implicitly treats the string as a character array, and, with a single index such as 6, returns a [char]instance.
  • It is the LHS (left-hand side) of an expression involving a binary operator such as -eq that determines what type the RHS (right-hand side) will be coerced to, if necessary, before applying the operator:
    • ([char] '1') -eq 1 # !! $false
      • Coercing the (implied) [int] type of RHS 1 to the LHS type [char] yields Unicode codepoint U+0001, i.e., a control character rather than the "ASCII" digit '1', which is why the comparison fails.
      • @PetSerAl's helpful, but cryptic suggestion (in a comment on the question) to use '1'[0] rather than 1 as the RHS solves the problem in this particular case, because '1'[0] returns 1 as a [char] instance, but the solution doesn't generalize to multi-character field values.
    • '1' -eq 1 # $true; same as: ([string] 1) -eq 1 or ([string] 1) -eq '1'
      • Converting integer 1 to a string indeed is the same as '1'.








No comments:

Post a Comment