Wednesday, March 7, 2012

how to delete lines from text file

i have a text file that is like:

date = OCT0606
asdf
sdaf
asdfasdgsdgh
asdfsdfasdg
START-OF-DATA
asdfasdfg
asdfgdfgsfg
sadfsdfgsa
asdfgsdfg
END-OF-DATA
asdfgalsdkdfklmlkm
asdfgasdfg

i need to clear everything from this file except the data between the
START-OF-DATA and END-OF-DATA using a batcj file... elternitavly i am
open to suggestions of how to import using bulk insert in sql without
changing the file at all. data is pipe seperated but obvioulsy has
plenty of junk data in it. i have 2 similar files at about 30mb and
60mb in size. thnks everyone"batman" <uspensky@.gmail.comwrote in message
news:1160363613.855662.104440@.b28g2000cwb.googlegr oups.com...

Quote:

Originally Posted by

>i have a text file that is like:
>
date = OCT0606
asdf
sdaf
asdfasdgsdgh
asdfsdfasdg
START-OF-DATA
asdfasdfg
asdfgdfgsfg
sadfsdfgsa
asdfgsdfg
END-OF-DATA
asdfgalsdkdfklmlkm
asdfgasdfg
>
>
>
i need to clear everything from this file except the data between the
START-OF-DATA and END-OF-DATA using a batcj file... elternitavly i am
open to suggestions of how to import using bulk insert in sql without
changing the file at all. data is pipe seperated but obvioulsy has
plenty of junk data in it. i have 2 similar files at about 30mb and
60mb in size. thnks everyone
>


Since you are posting from XP, do you want an XP solution?
alt.msdos.batch.nt deals with NT-series, and alt.msdos.batch with DOS and
9x.

Meanwhile, since pipe is a special character in DOS, can you be a little
more explicit about the contents of your file? How is
START-OF-DATA/END-OF-DATA detected (does it involve the presence of a pipe
or is it contained in one or more data-fields?) Does any line start with a
semicolon? Is there any non-alphameric content other than the pipes?|||good point...
in fact the file is a text file but contains the infamous blank square
all throughout. end and start of data have the square around them as
well., i looked up the exact character and it seems to be a simple
char(10). so it would be helpful to account for this in the
batchfile/sql.

billious wrote:

Quote:

Originally Posted by

"batman" <uspensky@.gmail.comwrote in message
news:1160363613.855662.104440@.b28g2000cwb.googlegr oups.com...

Quote:

Originally Posted by

i have a text file that is like:

date = OCT0606
asdf
sdaf
asdfasdgsdgh
asdfsdfasdg
START-OF-DATA
asdfasdfg
asdfgdfgsfg
sadfsdfgsa
asdfgsdfg
END-OF-DATA
asdfgalsdkdfklmlkm
asdfgasdfg

i need to clear everything from this file except the data between the
START-OF-DATA and END-OF-DATA using a batcj file... elternitavly i am
open to suggestions of how to import using bulk insert in sql without
changing the file at all. data is pipe seperated but obvioulsy has
plenty of junk data in it. i have 2 similar files at about 30mb and
60mb in size. thnks everyone


>
Since you are posting from XP, do you want an XP solution?
alt.msdos.batch.nt deals with NT-series, and alt.msdos.batch with DOS and
9x.
>
Meanwhile, since pipe is a special character in DOS, can you be a little
more explicit about the contents of your file? How is
START-OF-DATA/END-OF-DATA detected (does it involve the presence of a pipe
or is it contained in one or more data-fields?) Does any line start with a
semicolon? Is there any non-alphameric content other than the pipes?

|||"billious" <billious_1954@.hotmail.comwrote in message
news:4529c1ac$0$28985$a82e2bb9@.reader.athenanews.c om...

Quote:

Originally Posted by

>
"batman" <uspensky@.gmail.comwrote in message
news:1160363613.855662.104440@.b28g2000cwb.googlegr oups.com...

Quote:

Originally Posted by

>>i have a text file that is like:
>>
>date = OCT0606
>asdf
>sdaf
>asdfasdgsdgh
>asdfsdfasdg
>START-OF-DATA
>asdfasdfg
>asdfgdfgsfg
>sadfsdfgsa
>asdfgsdfg
>END-OF-DATA
>asdfgalsdkdfklmlkm
>asdfgasdfg
>>
>>
>>
>i need to clear everything from this file except the data between the
>START-OF-DATA and END-OF-DATA using a batcj file... elternitavly i am
>open to suggestions of how to import using bulk insert in sql without
>changing the file at all. data is pipe seperated but obvioulsy has
>plenty of junk data in it. i have 2 similar files at about 30mb and
>60mb in size. thnks everyone
>>


>
Since you are posting from XP, do you want an XP solution?
alt.msdos.batch.nt deals with NT-series, and alt.msdos.batch with DOS and
9x.
>
Meanwhile, since pipe is a special character in DOS, can you be a little
more explicit about the contents of your file? How is
START-OF-DATA/END-OF-DATA detected (does it involve the presence of a pipe
or is it contained in one or more data-fields?) Does any line start with a
semicolon? Is there any non-alphameric content other than the pipes?


NT4/2K/XP/2K3 (NT+ systems) are discussed in alt.msdos.batch.nt as the
techniques used differ markedly from DOS/9x methods.

For instance, given data like:

-- data begins ---
a|b|c
d|e|f
0|start|1
2|3|4
5|6|7
8|end|9
g|h|i
j|k|l
-- data ends ---

in a file "psd.txt" then the following will produce "psdout.txt"

-- batch begins ---
[1]@.echo off
[2]setlocal enabledelayedexpansion
[3]set yel=Y
[4]for /f %%i in (psd.txt) do call :process "%%i" &if !yel!==Y echo
%%i>>psdout.txt
[5]goto :eof
[6]
[7]:process
[8]if %yel%==L set yel=Y
[9]if %yel%==N for /f "tokens=2delims=|" %%j in (%1) do if /i %%j==end set
yel=L
[10]if %yel%==Y for /f "tokens=2delims=|" %%j in (%1) do if /i %%j==start
set yel=N
[11]goto :eof
-- batch ends ---

Lines start [number] - any lines not starting [number] have been wrapped
and should be rejoined. The [number] that starts the line should be removed

The label :eof is defined in NT+ to be end-of-file but MUST be expressed
as :eof

Without better knowledge of your file's content, refining this is a little
difficult.|||here is some actual content from the file (shortened ofcourse to just a
few records) ideally i would like to avoid using a batch file and keep
it all on the sql level (sql 2000)... the file pasted in here much
perttier than it looks in notepad (with the squares)

thanks for ur help

-- file starts below
START-OF-FILE
PROGRAMNAME=getdata
DATEFORMAT=yyyymmdd

START-OF-FIELDS
# Security Description
TICKER
EXCH_CODE
NAME
COUNTRY
CRNCY
SECURITY_TYP
PAR_AMT
EQY_PRIM_EXCH
EQY_PRIM_EXCH_SHRT

# Industry Classification
EQY_SIC_CODE
EQY_SIC_NAME
INDUSTRY_GROUP
INDUSTRY_SUBGROUP
INDUSTRY_SECTOR

# Identifiers
ID_SEDOL1
ID_WERTPAPIER
ID_ISIN
ID_DUTCH
ID_VALOREN
ID_FRENCH
ID_BELGIUM
ID_BB_COMPANY
ID_BB_SECURITY
ID_CUSIP
ID_COMMON

# ADRs
ADR_UNDL_TICKER
ADR_SH_PER_ADR

# Dividend Information
DVD_CRNCY
EQY_DVD_SH_12M_NET
EQY_DVD_SH_12M
EQY_DVD_SH_LAST
EQY_LAST_DPS_GROSS
EQY_DVD_PCT_FRANKED
EQY_DVD_TYP_LAST
EQY_DVD_FREQ
DVD_PAY_DT
DVD_RECORD_DT
DVD_DECLARED_DT
EQY_SPLIT_DT
EQY_SPLIT_RATIO
DVD_EX_DT
EQY_DVD_EX_FLAG

INDUSTRY_SUBGROUP_NUM
CNTRY_ISSUE_ISO
MARKET_STATUS
ID_BB_PARENT_CO
ADR_UNDL_CMPID
ADR_UNDL_SECID
REL_INDEX
PX_TRADE_LOT_SIZE
PARENT_COMP_TICKER
PARENT_COMP_NAME
ID_LOCAL
LONG_COMP_NAME
PARENT_INDUSTRY_GROUP
PARENT_INDUSTRY_SUBGROUP
PARENT_INDUSTRY_SECTOR
VOTING_RIGHTS
ID_BB_PRIM_SECURITY_FLAG
PAR_VAL_CRNCY
EQY_SH_OUT
EQY_SH_OUT_DT
ID_BB_UNIQUE
MARKET_SECTOR_DES
IS_STK_MARGINABLE
144A_FLAG
TRANSFER_AGENT
EQY_PRIM_SECURITY_TICKER
EQY_PRIM_SECURITY_COMP_EXCH
IS_SETS
WHICH_JAPANESE_SECTION
ADR_ADR_PER_SH
EQY_PRIM_SECURITY_PRIM_EXCH
EQY_FUND_CRNCY
WHEN_ISSUED
CDR_COUNTRY_CODE
CDR_EXCH_CODE
CNTRY_OF_INCORPORATION
CNTRY_OF_DOMICILE
SEC_RESTRICT
EQY_SH_OUT_REAL
ADR_UNDL_CRNCY
MULTIPLE_SHARE
PX_QUOTE_LOT_SIZE
PX_ROUND_LOT_SIZE
ID_SEDOL2
SEDOL1_COUNTRY_ISO
SEDOL2_COUNTRY_ISO
ID_MIC_PRIM_EXCH
ID_MIC_LOCAL_EXCH
EQY_SH_OUT_TOT_MULT_SH
SECURITY_TYP2
ID_BB_PRIM_SECURITY
EQY_OPT_AVAIL
EQY_FREE_FLOAT_PCT
END-OF-FIELDS

TIMESTARTED=Tue Sep 26 17:33:28 EDT 2006
START-OF-DATA
AA US Equity|0|95|AA|US|ALCOA INC|US|USD|Common
Stock|1.000000000|New York|UN|3334|ALUMINUM
PROD|Mining|Metal-Aluminum|Basic
Materials|2021805|850206|US0138171014|N.A.|N.A.|90 2130|094464|100046|1000|013817101|009988106|
| |USD|.600|.600|.15|.15| |Regular
Cash|Quarter|20061125|20061103|20060915|20000612|2 for
1|20061101|N|14|US|ACTV| | | |SPX|1| | |N.A.|Alcoa Inc| | |
|1.000|Y|USD|866.888|20060721|EQ0010004600001000|E quity|N.A.|N|EquiServe/First
Chicago Trust Co Div|AA|US|N| | |UN|USD|N|US|EX|US|US|0|866888257|
|N|1.00|100|N.A.|US|N.A.|XNYS| |866.888|Common Stock|1000|Y|99.6964|
AALA US Equity|0|95|AALA|US|AMERALIA INC|US|USD|Common
Stock|.010000000|OTC US|UV|N.A.| |Mining|Quarrying|Basic
Materials|2023588|880772|US0235592062|N.A.|N.A.|N. A.|N.A.|101793|1000|023559206|N.A.|
| | | | | | | | |None| | | |19930119|1 for 40| |N|18|US|ACTV| | |
|SPX|1| | |N.A.|Ameralia Inc| | |
|1.000|Y|USD|16.866|20040901|EQ0010179300001000|Eq uity|N.A.|N|Atlas
Stock Trasfer, Inc.|AALA|US|N| | |UV| |N|US|EX|US|US|0|16866301|
|N|1.00|100|N.A.|US|N.A.|XOTC| |16.866|Common Stock|1000|N|37.4822|
CALS CN Equity|0|95|CALS|CN|ASPIRE CAPITAL INC|CA|CAD|Common
Stock|N.A.|CNQ|CF|N.A.| |Investment Companies|Capital
Pools|Financial|N.A.|N.A.|CA04537P1080|N.A.|N.A.|N .A.|N.A.|1185676|1001|04537P108|N.A.|
|
|N.A.|N.A.|N.A.|N.A.|N.A.|N.A.|N.A.|N.A.|N.A.|N.A. |N.A.|N.A.|N.A.|N.A.|N.A.|702|CA|ACTV|
| | |SPTSX|1| | |N.A.|Aspire Capital Inc| | | |.000|Y|
|N.A.|N.A.|EQ0000000002830848|Equity|N.A.|N|Comput ershare Investor
Services Inc.|CALS|CN|N| | |CF| |N|CA|CX|CA|CA|0|N.A.|
|N|1.00|100|N.A.|N.A.|N.A.|XCNQ| |.000|Common Stock|1001|N|N.A.|
805208Z CN Equity|0|95|805208Z|CN|CUDA CAPITAL CORP|CA|CAD|Common
Stock|N.A.|Venture|CV|N.A.|
|N.A.|N.A.|N.A.|N.A.|N.A.|N.A.|N.A.|N.A.|N.A.|N.A. |11105828|1000|N.A.|N.A.|
|
|N.A.|N.A.|N.A.|N.A.|N.A.|N.A.|N.A.|N.A.|N.A.|N.A. |N.A.|N.A.|N.A.|N.A.|N.A.|N.A.|CA|PEND|
| | |SPTSX|1| | |N.A.|Cuda Capital Corp| | | |.000|Y|
|N.A.|N.A.|EQ0000000002831616|Equity|N.A.|N|N.A.|8 05208Z|CN|N| | |CV|
|N|CA|CX|CA|CA|0|N.A.| |N|1.00|100|N.A.|N.A.|N.A.|XTSX| |.000|Common
Stock|1000|N|N.A.|
END-OF-DATA
DATARECORDS=57129
TIMEFINISHED=Tue Sep 26 17:51:45 EDT 2006
END-OF-FILE

-- file end above
billious wrote:

Quote:

Originally Posted by

"billious" <billious_1954@.hotmail.comwrote in message
news:4529c1ac$0$28985$a82e2bb9@.reader.athenanews.c om...

Quote:

Originally Posted by


"batman" <uspensky@.gmail.comwrote in message
news:1160363613.855662.104440@.b28g2000cwb.googlegr oups.com...

Quote:

Originally Posted by

>i have a text file that is like:
>
date = OCT0606
asdf
sdaf
asdfasdgsdgh
asdfsdfasdg
START-OF-DATA
asdfasdfg
asdfgdfgsfg
sadfsdfgsa
asdfgsdfg
END-OF-DATA
asdfgalsdkdfklmlkm
asdfgasdfg
>
>
>
i need to clear everything from this file except the data between the
START-OF-DATA and END-OF-DATA using a batcj file... elternitavly i am
open to suggestions of how to import using bulk insert in sql without
changing the file at all. data is pipe seperated but obvioulsy has
plenty of junk data in it. i have 2 similar files at about 30mb and
60mb in size. thnks everyone
>


Since you are posting from XP, do you want an XP solution?
alt.msdos.batch.nt deals with NT-series, and alt.msdos.batch with DOS and
9x.

Meanwhile, since pipe is a special character in DOS, can you be a little
more explicit about the contents of your file? How is
START-OF-DATA/END-OF-DATA detected (does it involve the presence of a pipe
or is it contained in one or more data-fields?) Does any line start with a
semicolon? Is there any non-alphameric content other than the pipes?


>
NT4/2K/XP/2K3 (NT+ systems) are discussed in alt.msdos.batch.nt as the
techniques used differ markedly from DOS/9x methods.
>
For instance, given data like:
>
-- data begins ---
a|b|c
d|e|f
0|start|1
2|3|4
5|6|7
8|end|9
g|h|i
j|k|l
-- data ends ---
>
in a file "psd.txt" then the following will produce "psdout.txt"
>
-- batch begins ---
[1]@.echo off
[2]setlocal enabledelayedexpansion
[3]set yel=Y
[4]for /f %%i in (psd.txt) do call :process "%%i" &if !yel!==Y echo
%%i>>psdout.txt
[5]goto :eof
[6]
[7]:process
[8]if %yel%==L set yel=Y
[9]if %yel%==N for /f "tokens=2delims=|" %%j in (%1) do if /i %%j==end set
yel=L
[10]if %yel%==Y for /f "tokens=2delims=|" %%j in (%1) do if /i %%j==start
set yel=N
[11]goto :eof
-- batch ends ---
>
Lines start [number] - any lines not starting [number] have been wrapped
and should be rejoined. The [number] that starts the line should be removed
>
The label :eof is defined in NT+ to be end-of-file but MUST be expressed
as :eof
>
Without better knowledge of your file's content, refining this is a little
difficult.

|||batman <uspensky@.gmail.comwrote:

Quote:

Originally Posted by

i need to clear everything from this file except the data between the
START-OF-DATA and END-OF-DATA using a batcj file...


Are you able to use a SED UNIX port? If yes, then the task is fairly
trivial.

878915 Oct 25 2003 ftp://garbo.uwasa.fi/win95/unix/UnxUpdates.zip
UnxUpdates.zip Updates for UnxUtils GNU utilities for native Win32

If not, in XP (the newsgroup for that is alt.msdos.batch.nt) then you'll
first have to find out the line numbers of your delimiter, and the write
out the lines between and including those line numbers. More on that e.g. in

176733 Sep 11 2006 ftp://garbo.uwasa.fi/pc/link/tscmd.zip
tscmd.zip Useful NT/2000/XP script tricks and tips, T.Salmi

Windows Command Line Interface script programming links
http://www.uwasa.fi/~ts/http/http2.html#cmdscript
Followups set to alt.msdos.batch.nt

For all, asking PC batch programming related questions. You'll enhance
your chances and expedite getting answers if you state your OS when asking!

All the best, Timo

--
Prof. Timo Salmi ftp & http://garbo.uwasa.fi/ archives 193.166.120.5
Department of Accounting and Business Finance ; University of Vaasa
mailto:ts@.uwasa.fi <http://www.uwasa.fi/~ts/ ; FIN-65101, Finland
Timo's FAQ materials at http://www.uwasa.fi/~ts/http/tsfaq.html|||"batman" <uspensky@.gmail.comwrote in message
news:1160368462.065423.142350@.m7g2000cwm.googlegro ups.com...

Quote:

Originally Posted by

billious wrote:

Quote:

Originally Posted by

>>i have a text file that is like:
>>
>>date = OCT0606
>>asdf
>>START-OF-DATA
>>asdfasdfg
>>asdfgsdfg
>>END-OF-DATA
>>asdfgalsdkdfklmlkm
>>
>>i need to clear everything from this file except the data between the
>>START-OF-DATA and END-OF-DATA using a batcj file... elternitavly i am
>>open to suggestions of how to import using bulk insert in sql without
>>changing the file at all. data is pipe seperated but obvioulsy has
>>plenty of junk data in it. i have 2 similar files at about 30mb and
>>60mb in size. thnks everyone
>>
>>
>Since you are posting from XP, do you want an XP solution?
>alt.msdos.batch.nt deals with NT-series, and alt.msdos.batch with DOS and
>9x.
>>


[snip]

Quote:

Originally Posted by

Quote:

Originally Posted by

>Without better knowledge of your file's content, refining this is a
>little
>difficult.


>
here is some actual content from the file (shortened ofcourse to just a
few records) ideally i would like to avoid using a batch file and keep
it all on the sql level (sql 2000)... the file pasted in here much
perttier than it looks in notepad (with the squares)
>
thanks for ur help
>
>
-- file starts below
START-OF-FILE
PROGRAMNAME=getdata


[sniparoo]

Quote:

Originally Posted by

END-OF-FIELDS
>
TIMESTARTED=Tue Sep 26 17:33:28 EDT 2006
START-OF-DATA
AA US Equity|0|95|AA|US|ALCOA INC|US|USD|Common
Stock|1.000000000|New York|UN|3334|ALUMINUM
PROD|Mining|Metal-Aluminum|Basic


[more in this vein...]

Quote:

Originally Posted by

Stock|1000|N|N.A.|
END-OF-DATA
DATARECORDS=57129
TIMEFINISHED=Tue Sep 26 17:51:45 EDT 2006
END-OF-FILE
>
-- file end above


Without knowing quite where the "squares" are, other than surrounding the
START/END-OF-FILE, and assuming that "10" is 10 decimal, not 10HEX then the
following MAY work. Can't test it without knowing where else these
non-alphamerics are or whether reproducing them is important.

-- batch begins ---
[1]@.echo off
[2]setlocal enabledelayedexpansion
[3]set yel=N
[4]for /f "tokens=*" %%i in (psdx.txt) do call :process "%%i" &if !yel!==Y
echo %%i>>psdout.txt
[5]goto :eof
[6]
[7]:process
[8]:: if last-line-processed was "START-OF-DATA" switch ON echoing
[9]if %yel%==F set yel=Y
[10]:: get just first token - removing embarrassing pipes
[11]for /f "tokens=1delims=|" %%j in (%1) do set ytd=%%j
[12]if not defined ytd goto :eof
[13]:: remove nasty squares either end of "START/END-OF-DATA"
[14]::set ytd=%ytd:~1,-1%
[15]if "%ytd%"=="START-OF-DATA" set yel=F
[16]if "%ytd%"=="END-OF-DATA" set yel=N
[17]goto :eof
-- batch ends ---

Lines start [number] - any lines not starting [number] have been wrapped
and should be rejoined. The [number] that starts the line should be removed

The label :eof is defined in NT+ to be end-of-file but MUST be expressed
as :eof

[14] is deliberately commented-out. If the "square" is ^J (10 decimal) then
the process appears to work as-is. If the square is ^P (10 HEX) then you'd
need to remove the colons from [14]

And I obviously haven't checked that the reproduced data is PRECISELEY the
same as the source.

And PLEASE end-post. Top-posting is for emails, not Usenet.|||batman <uspensky@.gmail.comwrote:

Quote:

Originally Posted by

i have a text file that is like:
>
date = OCT0606
asdf
sdaf
asdfasdgsdgh
asdfsdfasdg
START-OF-DATA
asdfasdfg
asdfgdfgsfg
sadfsdfgsa
asdfgsdfg
END-OF-DATA
asdfgalsdkdfklmlkm
asdfgasdfg
>
i need to clear everything from this file except the data between the
START-OF-DATA and END-OF-DATA using a batcj file...


(alt.msdos.batch.nt added and followups redirected)

@.echo off & setlocal enableextensions enabledelayedexpansion
set source=C:\_M\MyData.txt
set flagStart=
set flagEnd=
for /f %%a in ('type "%source%"') do (
echo %%a|find "START-OF-DATA">nul
if !errorlevel! EQU 0 set flagStart=true
echo %%a|find "END-OF-DATA">nul
if !errorlevel! EQU 0 set flagEnd=true
if defined flagStart if not defined FlagEnd (
echo %%a|find /v "START-OF-DATA"
)
)
endlocal & goto :EOF

The output
C:\_D\TEST>cmdfaq
asdfasdfg
asdfgdfgsfg
sadfsdfgsa
asdfgsdfg

All the best, Timo

--
Prof. Timo Salmi ftp & http://garbo.uwasa.fi/ archives 193.166.120.5
Department of Accounting and Business Finance ; University of Vaasa
mailto:ts@.uwasa.fi <http://www.uwasa.fi/~ts/ ; FIN-65101, Finland
Useful script files and tricks ftp://garbo.uwasa.fi/pc/link/tscmd.zip|||"batman" <uspensky@.gmail.comwrote in message
news:1160368462.065423.142350@.m7g2000cwm.googlegro ups.com...

Quote:

Originally Posted by

here is some actual content from the file (shortened ofcourse to just a
few records) ideally i would like to avoid using a batch file and keep
it all on the sql level (sql 2000)... the file pasted in here much
perttier than it looks in notepad (with the squares)
>
thanks for ur help
>
>
-- file starts below


snip...

In Windows 98 I would use the following (tested) batch which uses VBScript
to extract the portion indicated. The extract will be output to a file named
out.txt and opened in notepad for viewing. This batch should work in Windows
XP also (untested).

@.echo off
echo$.vbs filename = "test.txt"
echo>>$.vbs s = "START-OF-DATA"
echo>>$.vbs e = "END-OF-DATA"
echo>>$.vbs Set fso = CreateObject("Scripting.FileSystemObject")
echo>>$.vbs Set f = fso.GetFile(filename)
echo>>$.vbs filesize = f.size
echo>>$.vbs Set f = fso.OpenTextFile (filename, 1, True)
echo>>$.vbs fin = f.Read(filesize)
echo>>$.vbs f.close
echo>>$.vbs s = InStr(fin, s) + 15
echo>>$.vbs e = InStr(fin, e)
echo>>$.vbs clip = Mid(fin, s , e - s - 2)
echo>>$.vbs Wscript.Echo clip
cscript /nologo $.vbs out.txt
del $.vbs
notepad out.txt

--
Todd Vargo
(Post questions to group only. Remove "z" to email personal messages)|||havent gotten a chance to test your replies yet guy but thank you. i
should be able to test them tomorrow. is there a way to do all this
within sql script? this will run using an automated process so i need
to eliminate the number of steps anywhere i can.|||batman wrote:

Quote:

Originally Posted by

i have a text file that is like:
>


--GIVEN this table dbo.ImportMyFile
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
SET ANSI_PADDING ON
GO
CREATE TABLE [dbo].[ImportMyFile](
[LineID] [int] IDENTITY(1,1) NOT NULL,
[LineText] [varchar](1000) COLLATE SQL_Latin1_General_CP1_CI_AS NULL
) ON [PRIMARY]
GO
SET ANSI_PADDING ON

-- and your file example here named 'C:\MyFile.txt'
-- and this simple format file named C:\MyFormatFile.txt

9.0

1

1 SQLCHAR 0 1000 "\r\n" 2 LineText
""

-- execute this T-SQL
BULK INSERT AdventureWorks.dbo.ImportMyFile
FROM 'C:\MyFile.txt' WITH (FORMATFILE='C:\MyFormatFile.txt');
GO
DELETE FROM dbo.ImportMyFile
WHERE LineID < =(SELECT LineID FROM dbo.ImportMyFile WHERE LineText =
'START-OF-FILE ')
OR LineID >=(SELECT LineID FROM dbo.ImportMyFile WHERE LineText =
'END-OF-FILE')

See these topics in Books Online
BULK INSERT (Transact-SQL)
Understanding Non-XML Format Files
Using a Format File to Skip a Table Column

variations are endless.|||On 8 Oct 2006 21:34:22 -0700, "batman" <uspensky@.gmail.comwrote:

Quote:

Originally Posted by

>here is some actual content from the file (shortened ofcourse to just a
>few records) ideally i would like to avoid using a batch file and keep
>it all on the sql level (sql 2000)... the file pasted in here much
>perttier than it looks in notepad (with the squares)
>
>thanks for ur help
>
>
>-- file starts below
>START-OF-FILE
>PROGRAMNAME=getdata
>DATEFORMAT=yyyymmdd
>
>START-OF-FIELDS
># Security Description
>TICKER
>EXCH_CODE
>NAME
>COUNTRY
>CRNCY
>SECURITY_TYP
>PAR_AMT
>EQY_PRIM_EXCH
>EQY_PRIM_EXCH_SHRT
>
># Industry Classification
>EQY_SIC_CODE
>EQY_SIC_NAME
>INDUSTRY_GROUP
>INDUSTRY_SUBGROUP
>INDUSTRY_SECTOR
>
># Identifiers
>ID_SEDOL1
>ID_WERTPAPIER
>ID_ISIN
>ID_DUTCH
>ID_VALOREN
>ID_FRENCH
>ID_BELGIUM
>ID_BB_COMPANY
>ID_BB_SECURITY
>ID_CUSIP
>ID_COMMON
>
># ADRs
>ADR_UNDL_TICKER
>ADR_SH_PER_ADR
>
># Dividend Information
>DVD_CRNCY
>EQY_DVD_SH_12M_NET
>EQY_DVD_SH_12M
>EQY_DVD_SH_LAST
>EQY_LAST_DPS_GROSS
>EQY_DVD_PCT_FRANKED
>EQY_DVD_TYP_LAST
>EQY_DVD_FREQ
>DVD_PAY_DT
>DVD_RECORD_DT
>DVD_DECLARED_DT
>EQY_SPLIT_DT
>EQY_SPLIT_RATIO
>DVD_EX_DT
>EQY_DVD_EX_FLAG
>
>INDUSTRY_SUBGROUP_NUM
>CNTRY_ISSUE_ISO
>MARKET_STATUS
>ID_BB_PARENT_CO
>ADR_UNDL_CMPID
>ADR_UNDL_SECID
>REL_INDEX
>PX_TRADE_LOT_SIZE
>PARENT_COMP_TICKER
>PARENT_COMP_NAME
>ID_LOCAL
>LONG_COMP_NAME
>PARENT_INDUSTRY_GROUP
>PARENT_INDUSTRY_SUBGROUP
>PARENT_INDUSTRY_SECTOR
>VOTING_RIGHTS
>ID_BB_PRIM_SECURITY_FLAG
>PAR_VAL_CRNCY
>EQY_SH_OUT
>EQY_SH_OUT_DT
>ID_BB_UNIQUE
>MARKET_SECTOR_DES
>IS_STK_MARGINABLE
>144A_FLAG
>TRANSFER_AGENT
>EQY_PRIM_SECURITY_TICKER
>EQY_PRIM_SECURITY_COMP_EXCH
>IS_SETS
>WHICH_JAPANESE_SECTION
>ADR_ADR_PER_SH
>EQY_PRIM_SECURITY_PRIM_EXCH
>EQY_FUND_CRNCY
>WHEN_ISSUED
>CDR_COUNTRY_CODE
>CDR_EXCH_CODE
>CNTRY_OF_INCORPORATION
>CNTRY_OF_DOMICILE
>SEC_RESTRICT
>EQY_SH_OUT_REAL
>ADR_UNDL_CRNCY
>MULTIPLE_SHARE
>PX_QUOTE_LOT_SIZE
>PX_ROUND_LOT_SIZE
>ID_SEDOL2
>SEDOL1_COUNTRY_ISO
>SEDOL2_COUNTRY_ISO
>ID_MIC_PRIM_EXCH
>ID_MIC_LOCAL_EXCH
>EQY_SH_OUT_TOT_MULT_SH
>SECURITY_TYP2
>ID_BB_PRIM_SECURITY
>EQY_OPT_AVAIL
>EQY_FREE_FLOAT_PCT
>END-OF-FIELDS
>
>TIMESTARTED=Tue Sep 26 17:33:28 EDT 2006
>START-OF-DATA
>AA US Equity|0|95|AA|US|ALCOA INC|US|USD|Common
>Stock|1.000000000|New York|UN|3334|ALUMINUM
>PROD|Mining|Metal-Aluminum|Basic
>Materials|2021805|850206|US0138171014|N.A.|N.A.|90 2130|094464|100046|1000|013817101|009988106|
>| |USD|.600|.600|.15|.15| |Regular
>Cash|Quarter|20061125|20061103|20060915|20000612|2 for
>1|20061101|N|14|US|ACTV| | | |SPX|1| | |N.A.|Alcoa Inc| | |
>|1.000|Y|USD|866.888|20060721|EQ0010004600001000|E quity|N.A.|N|EquiServe/First
>Chicago Trust Co Div|AA|US|N| | |UN|USD|N|US|EX|US|US|0|866888257|
>|N|1.00|100|N.A.|US|N.A.|XNYS| |866.888|Common Stock|1000|Y|99.6964|
>AALA US Equity|0|95|AALA|US|AMERALIA INC|US|USD|Common
>Stock|.010000000|OTC US|UV|N.A.| |Mining|Quarrying|Basic
>Materials|2023588|880772|US0235592062|N.A.|N.A.|N. A.|N.A.|101793|1000|023559206|N.A.|
>| | | | | | | | |None| | | |19930119|1 for 40| |N|18|US|ACTV| | |
>|SPX|1| | |N.A.|Ameralia Inc| | |
>|1.000|Y|USD|16.866|20040901|EQ0010179300001000|Eq uity|N.A.|N|Atlas
>Stock Trasfer, Inc.|AALA|US|N| | |UV| |N|US|EX|US|US|0|16866301|
>|N|1.00|100|N.A.|US|N.A.|XOTC| |16.866|Common Stock|1000|N|37.4822|
>CALS CN Equity|0|95|CALS|CN|ASPIRE CAPITAL INC|CA|CAD|Common
>Stock|N.A.|CNQ|CF|N.A.| |Investment Companies|Capital
>Pools|Financial|N.A.|N.A.|CA04537P1080|N.A.|N.A.|N .A.|N.A.|1185676|1001|04537P108|N.A.|
>|
>|N.A.|N.A.|N.A.|N.A.|N.A.|N.A.|N.A.|N.A.|N.A.|N.A. |N.A.|N.A.|N.A.|N.A.|N.A.|702|CA|ACTV|
>| | |SPTSX|1| | |N.A.|Aspire Capital Inc| | | |.000|Y|
>|N.A.|N.A.|EQ0000000002830848|Equity|N.A.|N|Comput ershare Investor
>Services Inc.|CALS|CN|N| | |CF| |N|CA|CX|CA|CA|0|N.A.|
>|N|1.00|100|N.A.|N.A.|N.A.|XCNQ| |.000|Common Stock|1001|N|N.A.|
>805208Z CN Equity|0|95|805208Z|CN|CUDA CAPITAL CORP|CA|CAD|Common
>Stock|N.A.|Venture|CV|N.A.|
>|N.A.|N.A.|N.A.|N.A.|N.A.|N.A.|N.A.|N.A.|N.A.|N.A. |11105828|1000|N.A.|N.A.|
>|
>|N.A.|N.A.|N.A.|N.A.|N.A.|N.A.|N.A.|N.A.|N.A.|N.A. |N.A.|N.A.|N.A.|N.A.|N.A.|N.A.|CA|PEND|
>| | |SPTSX|1| | |N.A.|Cuda Capital Corp| | | |.000|Y|
>|N.A.|N.A.|EQ0000000002831616|Equity|N.A.|N|N.A.|8 05208Z|CN|N| | |CV|
>|N|CA|CX|CA|CA|0|N.A.| |N|1.00|100|N.A.|N.A.|N.A.|XTSX| |.000|Common
>Stock|1000|N|N.A.|
>END-OF-DATA
>DATARECORDS=57129
>TIMEFINISHED=Tue Sep 26 17:51:45 EDT 2006
>END-OF-FILE
>
>-- file end above
>billious wrote:

Quote:

Originally Posted by

>"billious" <billious_1954@.hotmail.comwrote in message
>news:4529c1ac$0$28985$a82e2bb9@.reader.athenanews.c om...

Quote:

Originally Posted by

>
"batman" <uspensky@.gmail.comwrote in message
news:1160363613.855662.104440@.b28g2000cwb.googlegr oups.com...
>>i have a text file that is like:
>>
>date = OCT0606
>asdf
>sdaf
>asdfasdgsdgh
>asdfsdfasdg
>START-OF-DATA
>asdfasdfg
>asdfgdfgsfg
>sadfsdfgsa
>asdfgsdfg
>END-OF-DATA
>asdfgalsdkdfklmlkm
>asdfgasdfg
>>
>>
>>
>i need to clear everything from this file except the data between the
>START-OF-DATA and END-OF-DATA using a batcj file... elternitavly i am
>open to suggestions of how to import using bulk insert in sql without
>changing the file at all. data is pipe seperated but obvioulsy has
>plenty of junk data in it. i have 2 similar files at about 30mb and
>60mb in size. thnks everyone
>>
>
Since you are posting from XP, do you want an XP solution?
alt.msdos.batch.nt deals with NT-series, and alt.msdos.batch with DOS and
9x.
>
Meanwhile, since pipe is a special character in DOS, can you be a little
more explicit about the contents of your file? How is
START-OF-DATA/END-OF-DATA detected (does it involve the presence of a pipe
or is it contained in one or more data-fields?) Does any line start with a
semicolon? Is there any non-alphameric content other than the pipes?


>>
>NT4/2K/XP/2K3 (NT+ systems) are discussed in alt.msdos.batch.nt as the
>techniques used differ markedly from DOS/9x methods.
>>
>For instance, given data like:
>>
>-- data begins ---
>a|b|c
>d|e|f
>0|start|1
>2|3|4
>5|6|7
>8|end|9
>g|h|i
>j|k|l
>-- data ends ---
>>
>in a file "psd.txt" then the following will produce "psdout.txt"
>>
>-- batch begins ---
>[1]@.echo off
>[2]setlocal enabledelayedexpansion
>[3]set yel=Y
>[4]for /f %%i in (psd.txt) do call :process "%%i" &if !yel!==Y echo
>%%i>>psdout.txt
>[5]goto :eof
>[6]
>[7]:process
>[8]if %yel%==L set yel=Y
>[9]if %yel%==N for /f "tokens=2delims=|" %%j in (%1) do if /i %%j==end set
>yel=L
>[10]if %yel%==Y for /f "tokens=2delims=|" %%j in (%1) do if /i %%j==start
>set yel=N
>[11]goto :eof
>-- batch ends ---
>>
>Lines start [number] - any lines not starting [number] have been wrapped
>and should be rejoined. The [number] that starts the line should be removed
>>
>The label :eof is defined in NT+ to be end-of-file but MUST be expressed
>as :eof
>>
>Without better knowledge of your file's content, refining this is a little
>difficult.


awk "{if( $0 ~ /END-OF-DATA/) f=0; if( f ) print $0; if( $0 ~
/START-OF-DATA/) f=1}" infile outfile

All one line - awk is gawk.exe, free and open source, from
<http://gnuwin32.sourceforge.net/packages/gawk.htm>. That copies the
lines *between* the markers, but not the markers themselves. It's
easily rearranged to include them: swap the first and third
statements.

--
T.E.D. (tdavis@.gearbox.maem.umr.edu) Remove "gearbox.maem" to get real address - that one is dead|||Ted Davis <tdavis@.gearbox.maem.umr.eduwrote:

Quote:

Originally Posted by

On 8 Oct 2006 21:34:22 -0700, "batman" <uspensky@.gmail.comwrote:

Quote:

Originally Posted by

Quote:

Originally Posted by

>>>>i have a text file that is like:
>>>>>
>>>>date = OCT0606
>>>>asdf
>>>>sdaf
>>>>asdfasdgsdgh
>>>>asdfsdfasdg
>>>>START-OF-DATA
>>>>asdfasdfg
>>>>asdfgdfgsfg
>>>>sadfsdfgsa
>>>>asdfgsdfg
>>>>END-OF-DATA
>>>>asdfgalsdkdfklmlkm
>>>>asdfgasdfg
>>>>>
>>>>i need to clear everything from this file except the data between the
>>>>START-OF-DATA and END-OF-DATA using a batcj file... elternitavly i am


Quote:

Originally Posted by

awk "{if( $0 ~ /END-OF-DATA/) f=0; if( f ) print $0; if( $0 ~
/START-OF-DATA/) f=1}" infile outfile


Also as below. The first is inclusive, the second exclusive

sed -n "/START-OF-DATA/,/END-OF-DATA/p" infile
sed "1,/START-OF-DATA/d" infile|sed "/END-OF-DATA/,$d"

All the best, Timo

--
Prof. Timo Salmi ftp & http://garbo.uwasa.fi/ archives 193.166.120.5
Department of Accounting and Business Finance ; University of Vaasa
mailto:ts@.uwasa.fi <http://www.uwasa.fi/~ts/ ; FIN-65101, Finland
Useful script files and tricks ftp://garbo.uwasa.fi/pc/link/tscmd.zip|||Todd Vargo <tlvargo@.sbcglobal.netzwrote:

Quote:

Originally Posted by

In Windows 98 I would use the following (tested) batch which uses
VBScript to extract the portion indicated. The extract will be output
to a file named out.txt and opened in notepad for viewing. This batch
should work in Windows XP also (untested).


Tested ok for XP.

All the best, Timo

--
Prof. Timo Salmi ftp & http://garbo.uwasa.fi/ archives 193.166.120.5
Department of Accounting and Business Finance ; University of Vaasa
mailto:ts@.uwasa.fi <http://www.uwasa.fi/~ts/ ; FIN-65101, Finland
Useful batch files and tricks ftp://garbo.uwasa.fi/pc/link/tsbat.zip|||JRS: In article <1160363613.855662.104440@.b28g2000cwb.googlegroups. com>,
dated Sun, 8 Oct 2006 20:13:33 remote, seen in news:alt.msdos.batch,
batman <uspensky@.gmail.composted :

Quote:

Originally Posted by

>
>i need to clear everything from this file except the data between the
>START-OF-DATA and END-OF-DATA using a batcj file...


MiniTrue :

mtr -n -x+ $1.txt (\A[^]*START-OF-DATA)([^]*)(END-OF-DATA[^]*) = \2

There may be a limit on the sizes of the three components of the file.

Otherwise, it's trivial in any HLL, even BASIC.

--
John Stockton, Surrey, UK. ?@.merlyn.demon.co.uk Turnpike v4.00 MIME.
Web <URL:http://www.merlyn.demon.co.uk/- FAQish topics, acronyms, & links.
I find MiniTrue useful for viewing/searching/altering files, at a DOS prompt;
free, DOS/Win/UNIX, <URL:http://www.idiotsdelight.net/minitrue/>|||"batman" <uspensky@.gmail.comwrote in message
news:1160435561.664102.10910@.e3g2000cwe.googlegrou ps.com...

Quote:

Originally Posted by

havent gotten a chance to test your replies yet guy but thank you. i
should be able to test them tomorrow. is there a way to do all this
within sql script? this will run using an automated process so i need
to eliminate the number of steps anywhere i can.
>


Sorry, I don't use SQL so I don't know.

--
Todd Vargo
(Post questions to group only. Remove "z" to email personal messages)

No comments:

Post a Comment